3.1 KiB
3.1 KiB
Data Platform Migration Guide
Date: 2025-01-27 Purpose: Guide for migrating projects to data platform Status: Complete
Overview
This guide provides instructions for migrating projects to use the centralized data platform (MinIO/S3).
Prerequisites
- MinIO deployed and configured
- Buckets created
- Access credentials configured
- Data catalog set up (optional)
Migration Steps
Step 1: Install S3 Client
pnpm add @aws-sdk/client-s3
Step 2: Configure S3 Client
import { S3Client } from '@aws-sdk/client-s3';
const s3Client = new S3Client({
endpoint: process.env.MINIO_ENDPOINT || 'http://minio:9000',
region: 'us-east-1',
credentials: {
accessKeyId: process.env.MINIO_ACCESS_KEY || 'minioadmin',
secretAccessKey: process.env.MINIO_SECRET_KEY || 'minioadmin',
},
forcePathStyle: true, // Required for MinIO
});
Step 3: Upload Data
import { PutObjectCommand } from '@aws-sdk/client-s3';
async function uploadData(bucket: string, key: string, data: Buffer) {
const command = new PutObjectCommand({
Bucket: bucket,
Key: key,
Body: data,
ContentType: 'application/json',
});
await s3Client.send(command);
}
Step 4: Download Data
import { GetObjectCommand } from '@aws-sdk/client-s3';
async function downloadData(bucket: string, key: string): Promise<Buffer> {
const command = new GetObjectCommand({
Bucket: bucket,
Key: key,
});
const response = await s3Client.send(command);
const chunks: Uint8Array[] = [];
for await (const chunk of response.Body as any) {
chunks.push(chunk);
}
return Buffer.concat(chunks);
}
Step 5: List Objects
import { ListObjectsV2Command } from '@aws-sdk/client-s3';
async function listObjects(bucket: string, prefix?: string) {
const command = new ListObjectsV2Command({
Bucket: bucket,
Prefix: prefix,
});
const response = await s3Client.send(command);
return response.Contents || [];
}
Step 6: Register in Data Catalog
async function registerDataset(metadata: DatasetMetadata) {
// Register in data catalog
await fetch('/api/catalog/datasets', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(metadata),
});
}
Best Practices
Bucket Organization
- Use consistent naming:
{project}-{environment}-{type} - Examples:
analytics-prod-events,user-data-dev-profiles
Data Formats
- Use Parquet for analytics data
- Use JSON for configuration data
- Use CSV for simple data exports
Access Control
- Use bucket policies
- Implement IAM-like permissions
- Encrypt sensitive data
Data Catalog
- Register all datasets
- Include metadata
- Tag appropriately
Migration Checklist
- Install S3 client
- Configure S3 client
- Create buckets
- Set up access credentials
- Migrate data
- Update code to use S3
- Register in data catalog
- Test data access
- Update documentation
- Set up monitoring
Last Updated: 2025-01-27