Data Intelligence
ML Data Engineer
The bridge between raw data and model-ready inputs
Specializes in the data layer closest to models — feature stores, embedding pipelines, vector databases, retrieval-augmented generation infrastructure.
What this role covers
Feature engineering — Turning raw data into the signals AI models can actually use
Embedding pipelines — Chunking, embedding, indexing, and retrieving at scale
Vector infrastructure — Pinecone, Weaviate, pgvector — choosing and operating the right store
RAG pipelines — Retrieval-augmented generation — making AI systems knowledge-aware
Evaluation data — Building ground truth datasets that make model assessment possible
When you need this role
Companies building RAG or knowledge retrieval systems
"We want AI that actually knows our product and our docs. We have no idea how to build the retrieval layer. Our vector store is a mess."
E-commerce, recommendation, personalization
"Our recommendations are generic because the feature pipeline feeding the model is three months stale. We need real-time feature engineering."