Model-centric to Data-centric AI – am I missing something?
Andrew Ng is a key reference point for me in understanding AI.
Andrew Ng is always easy to understand – especially for new and complex ideas.
Hence. it’s a bit challenging for me when I cannot fully understand something from Andrew
Recently, Andrew has been proposing the idea of MLOps from model centrc to data centric.
A lively discussion has been created – for example this conversation on linkedin has 778 plus comments
There is a good youtube discussion also
Why is this still not clear to me?
Yet for me, this idea of Model-centric to Data-centric AI is not fully clear
Let me elaborate
Recently, someone sought my advice on writing a new book on MLOps
I advised against it because MLOps is a crowded space already
So, when I think of MLOps from model centric to data centric – I find it hard to distinguish between MLOps itself
And for AI practitioners, MLOps is not new.
In fact, I would argue that if you are a large bank or similar institution, you could not risk deploying a model without MLOps
The second point is, model centric vs data centric is a dichotomy but in reality there are more than two elements.
For example, you would need to consider at least data, models and features instead of just model vs data.
The original discussion is framed as:
Would love your feedback on this: AI Systems = Code (model/algorithm) + Data. Most academic benchmarks/competitions hold the Data fixed, and let teams work on the Code. Thinking of organizing something where we hold the Code fixed, and ask teams to work on the Data.
Hoping this will more closely reflect ML application practice, and also spur innovative research on data-centric AI development. What do you think?
I think in the above the operative word is ‘academic’
If so, that brings more clarity
AI is a unique discipline because it brings academic research with practise much more closely than other disciplines.
And the two worlds are quite different.
So, while MLOps is the norm for practitioners, it may not be so obvious to all as different perspectives amalgamate.
Some more comments
- Raising the significance of good data for a model is always a good idea
- In larger projects, at least three job types work together (data engineers, data scientists and devops engineers). So, again, there is value in raising the awareness of MLOps
- Are we trying to say that MLOps should be about ensuring that data is consistent and of high quality throughout the project lifecycle? That could mean a data driven emphasis on MLOps. Raising that awareness is also a good view – as per the comment “Important frontier: MLOps tools to make data-centric AI an
efficient and systematic process. “
I find the framework of Model-centric to Data-centric AI limiting in the sense of holding model fixed and vary the data or vice versa. But nevertheless, it helps to raise awareness of data itself and could be useful when different perspectives of AI interplay
Image source: Andrew Ng