Choosing the Right Data Backbone: Opinion on the Suitability of Database Models for Artificial Intelligence (AI) Training Data Management
Abstract
Artificial intelligence (AI) systems rely heavily on large volumes of diverse, high-quality training data. Efficient storage and management of this data are critical to ensure the performance, reproducibility and scalability of AI models. This opinion paper explores the suitability of various database models including relational databases, NoSQL systems, object storage and data lakes for managing AI training datasets. Drawing from existing literature and practical insights, the paper highlights the strengths and limitations of each model in supporting AI workflows. It advocates for hybrid and adaptive architectures that can meet the growing complexity and demands of AI development.