Technology Challenges and Research Opportunities in Database System
- By Yannis Ioannids
- In 2025-03-13
- At ZHONGGUANYUAN GLOBAL VILLAGE, PKU
Abstract
Database Systems have existed for six decades and have had significant impact on how private and public organizations manage their data in the context of industrial, societal, and scientific activities. The relational model and the SQL language are the global standards for accessing databases and have served as the foundations for advancing the field through the years, creating a multi-billion industry around the world. With the current advances in several other technological areas, notably, new general and specialized hardware, high-performance and high-throughput (cloud) computing, and machine learning / AI / LLMs, database technology is in front of yet another revolution of its own, addressing numerous new challenges, pointing to new directions for industrial advances, and opening new opportunities for exciting research. In this presentation, I will give a quick tour of the status of database systems and then elaborate on some relevant research and technological challenges that could possibly lead to the next breakthroughs in the field. Particular attention will be given to applications (e.g., data science, data analytics, and edge computing), where User-Defined Functions (UDFs) are used within relational database queries in SQL. In such contexts, I will give some highlights of YeSQL, a language and system developed with colleagues and members of my team, which is effectively addressing the impedance mismatch that exists between UDF evaluation and relational operator processing, achieving significant speedups of up to 68x in common, practical use cases compared to earlier approaches and alternative implementation choices.
Technological Challenges
1. AI = ML + LLM
Query Optimization
Admission Control
LLM for User Interface Enhanced
Database Retrieval Enhanced LLMs
- RAG
- Embedding
- Top K relevant documents
2. Cloud
- Privacy-based Processing of Shared Sensitive Data
- MIP Federated Learning
- New Architectures fir User-Defined Functions (UDFs)
Challenges: Performance
Their works: YeSQL
3. Hardware
- Hardware Acceleratoes
- GPU
- TPU
- … parallel
CPU is the control plane, Accelerators are the data plane
- New Memory for Buffer Management and Query Processing
- Two-Tier Memory
- Processing in Memory, PIM
Society Challenges
4. Energy Consumption
(SUST) New Everything for Sustainability
- Energy-Efficient Hardware/ Sustainability of Computation
- Scope1: Direct
- Scope2: Query processing and optimization
- Scope3: Data application operating
QA
Why execute SQL in translation rather than compiling?
① Various Situation 一百条数据和十万条数据处理方式不同
② Various Hardware platform 在CPU执行和在FPGA执行不同
官方总结
Ioannidis教授首先简要回顾了数据库系统的发展历程和基础架构,并指出了近年来数据库系统正在发生的变革。第一,人工智能技术(如机器学习方法和大语言模型)逐渐在数据库系统中得到应用。第二,大量数据库系统已经从本地环境迁移到云计算环境。第三,数据库系统运行依赖的通用和专用硬件也在发生变化。Ioannidis教授认为,这三点变化为数据库系统的产业发展和学术研究带来了新的机遇和挑战。
针对人工智能技术与数据库系统的融合,Ioannidis教授分别从三个层次指明了方向。首先,在数据库系统内部,查询优化和准入控制等经典问题传统上通过数学建模和启发式方法去解决,而机器学习方法已经可以超越这些传统方法。其次,大语言模型的出现,使得用户可以以自然语言与数据库系统交互,大大降低了用户的学习成本。最后,数据库系统也可以通过检索增强生成的方式进一步扩展大语言模型的知识边界。
面向云计算环境带来的新挑战,Ioannidis教授分别从敏感数据保护和用户定义函数两个视角展开介绍了其研究工作。Ioannidis教授以罕见病数据在医院间共享的场景为例,介绍了其团队在联邦学习方法上的实践。针对云计算环境计算与存储解耦的体系结构变化,Ioannidis教授团队提出的YeSQL将用户定义函数分载到存储侧,从而将性能提升最多68倍。
在硬件变革方面,Ioannidis教授认为异构加速器的涌现既是对数据库系统查询优化的巨大挑战,也是研究者的新机遇。同时,新型内存和存储介质的出现,使得向内存和存储侧分载计算成为一个值得探索的方向。
最后,Ioannidis教授指出可持续发展也是数据库系统研究的重要目标,未来的数据库系统应当使用更少的能源和硬件从而减少温室气体的排放。
在互动环节,与会师生与Ioannidis教授围绕讲座内容积极互动,Ioannidis教授热情地解答了相关问题,并鼓励年轻学者积极投身数据库系统的相关研究。活动现场座无虚席,气氛热烈。