Optimizing an AI solution for capacity and cost, while scaling for growth, means taking a fresh look at its data pipeline.
Let’s break down the various stages of an AI workload and explain the role your data pipeline plays along the way.
Key Points The volume, velocity, and variety of data coursing through the AI pipeline changes at every stage.
Right out of the gate an AI infrastructure must be structured to take in massive amounts of data, even if it’s not all used for training neural networks.
“Data sets can arrive in the pipeline as petabytes, move into training as gigabytes of structured and semi-structured data, and complete their journey as trained models in the kilobyte size,” noted Roger Corell, storage marketing manager at Intel.
An AI infrastructure erected for today’s needs will invariably grow with larger data volumes and more complex models.
Efficient protocols like NVMe make it possible to disaggregate, or separate, storage and still maintain the low latencies needed by AI. At the 2019 Storage Developer Conference, Dr. Sanhita Sarkar, global director of analytics software development at Western Digital, gave multiple examples of disaggregated data pipelines for AI, which included pools of GPU compute, shared pools of NVMe-based flash storage, and object storage for source data or archival, any of which could be expanded independently.
This article was summarized automatically with AI / Article-Σ ™/ BuildR BOT™.