Data IntelligenceML Data Engineer
Data Intelligence
ML Data Engineer
The bridge between raw data and model-ready inputs
Specializes in the data layer closest to models — feature stores, embedding pipelines, vector databases, retrieval-augmented generation infrastructure.
What this role covers
Feature engineeringTurning raw data into the signals AI models can actually use
Embedding pipelinesChunking, embedding, indexing, and retrieving at scale
Vector infrastructurePinecone, Weaviate, pgvector — choosing and operating the right store
RAG pipelinesRetrieval-augmented generation — making AI systems knowledge-aware
Evaluation dataBuilding ground truth datasets that make model assessment possible
When you need this role
Companies building RAG or knowledge retrieval systems
"We want AI that actually knows our product and our docs. We have no idea how to build the retrieval layer. Our vector store is a mess."
E-commerce, recommendation, personalization
"Our recommendations are generic because the feature pipeline feeding the model is three months stale. We need real-time feature engineering."