Getting started with the purpose of your Artificial Intelligence (AI) IoT, ML, NLP journey is almost impossible without data. And for one data scientist or developer resource using their local machine with a simple Jupyter notebook instance with sample data and a basic prototype, on a small project that’s just a hobby. But in order to create enterprise AI/ML solutions for your product or company and then scale you need a data ingestion and egress system that can allow for collaboration, snapshotting, and lineage analysis (aka impact analysis).
However, all of the factors that go into an AI storage system come down to the purpose of your initiative. Is it light weight? is it just one person? Is budget unlimited? Is the project throwaway, etc.? So here’s a three ways to create a starter AI data storage system without going into too much detail that you’ll like but perhaps you grow to love depending on the context and depth of your initiative.
#1 – Cloud Object Storage
#2 – Git (specifically Git Branching)
#3 – Data Lake House