
Dept. of Electrical and Computer Engineering
Data Storage Lab (DSL)
Center for Cybersecurity Innovation & Outreach (CyIO)
Center for Wireless, Communities and Innovation (WiCI)
Phone: (515) 294-6285
Email: mai AT iastate DOT edu
www.ece.iastate.edu/~mai
Data Storage Lab
Our research are motivated by problems of real-world systems that jeopardize data, e.g.:
- Data Corruptions@HPC Center [TOS'18 & TOS'22] [Root cause: power outages + bugs in failure handling]
- Data Lost & Service Interrupt@Supercomputing Centers [Root cause: bugs in backup utilities/failure handling]
- Server Crashes@Algolia Data Center: When Solid State Drives Are Not That Solid [Root cause: a bug in Linux kernel + different sensitivity of SSDs]
- Serialization Errors on SSDs [TOCS'16] [Root cause: buggy interactions b/w Linux kernel & SSDs]
![]() |
|
We build systems/tools to attack such problems and open source our research prototypes.
[more
...]
Selected Publications
-
λFS: A Scalable and Elastic Distributed File
System Metadata Service using Serverless Functions. [to
appear in ASPLOS'24]
-
Understanding Persistent-Memory Related Issues in the Linux
Kernel. [to appear in TOS'23]
-
ConfD: Analyzing Configuration Dependencies of File Systems
for Fun and Profit. [FAST'23]
-
Drill: Log-based Anomaly Detection for Large-scale Storage
Systems Using Source Code Analysis. [IPDPS'23]
-
FaultyRank: A Graph-based Parallel File System Checker.
[IPDPS'23]
-
Data Distribution for Heterogeneous Storage Systems. [TOC'22]
-
PROV-IO: An I/O-Centric Provenance Framework for Scientific
Data on HPC Systems. [HPDC'22]
- Understanding
Configuration Dependencies of File Systems. [HotStorage'22]
Best paper nominee!
-
A Study of Failure Recovery and Logging of High-Performance
Parallel File Systems. [TOS'22]
-
Benchmarking for Observability: The Case of Diagnosing
Storage Failures. [TBench'21]
[BugBenchk]
- ARA: A Wireless Living Lab Vision for Smart and Connected Rural Communities. [WiNTECH'21] [ARA Platform]
-
SentiLog: Anomaly Detection on Parallel File Systems via
Log-based Sentiment Analysis . [HotStorage'21]
Best paper nominee!
- A
Study of Persistent Memory Bugs in the Linux Kernel. [SYSTOR'21]
- Lessons
and Actions: What We Learned from 10K SSD-Related Storage
System Failures. [USENIX
ATC'19]
- A
Performance Study of Lustre File System Checker: Bottlenecks
and Potentials. [MSST'19]
- Towards Robust File System Checkers. [TOS'18] Fast-tracked!
- Data Storage Research Vision 2025. [NSF Visioning Workshop]
- Understanding
SSD Reliability in Large-Scale Cloud Systems. [SC'18-PDSW]
- PFault:
A General Framework for Analyzing the Reliability of
High-Performance Parallel File Systems. [ICS'18]
- Towards Robust File System Checkers. [FAST'18] Best paper nominee!
-
Understanding the Fault Resilience of File System Checkers.
[HotStorage'17]
- Reliability
Analysis of SSDs under Power Fault. [TOCS'16]
-
Torturing Databases for Fun and Profit. [OSDI'14]
-
GMRace: Detecting Data Races in GPU Programs via A
Low-Overhead Scheme. [TPDS'14]
- Understanding
the Robustness of SSDs under Power Fault. [FAST'13]
-
2ndStrike: Towards Manifesting Hidden Concurrency Typestate
Bugs. [ASPLOS'11]
- GRace:
A Low-Overhead Mechanism for Detecting Data Races in GPU
Programs. [PPoPP'11]
[more
...]
Prospective Students
I'm always looking for self-motivated & intellectually-strong students who are curious about how computer systems work and are interested in improving the design, implementation, evaluation, and application of various computer systems. Please check out our recent publications & projects and let me know if anything interests you. If you have experience in building systems, that's great! Let's talk and see if we have mutual interests. If you come from a different background, that's OK, too. I'm more than happy to pass my hands-on experience to you and help you grow and succeed, as long as you are hardworking, responsible, determined, and have the desire to become an expert in a challenging and high-impact area in the near future.
[more
...]
Teaching
- ISU CprE563 Advanced Data Storage Systems [Spring'20, Spring'21, Spring'22, Spring'23]
- ISU CprE308 Operating Systems [Fall'18, Fall'19, Fall'20, Fall'21, Spring'22, Fall'22, Spring'23]
- ISU CprE588 Embedded Computer Systems [Spring'19]
- NMSU CS479/579 Special Topics: Reliable Storage Systems [Fall'17]
- NMSU CS479/579 Special Topics: Modern Storage Systems: Flash, Cloud, & Beyond [Spring'16]
- NMSU CS574 Operating Systems II [Spring'17, Spring'18]
- NMSU CS474 Operating Systems I [Fall'15, Fall'16]
- NMSU CS573 Computer Architecture II [Fall'17]
- NMSU CS473 Computer Architecture I [Spring'18]
- NMSU CS491/521 Parallel Programming [Fall'16]
- OSU CSE4251 The UNIX Programming Environment [Fall'14, Spring'15]
[more
...]
Miscellaneous
- Why (Not) Do a PhD (in Computer Science/Engineering): [CRA | Professor@Purdue | Student@Harvard | The PhD Grind]
- [Why Iowa State: Facts & Rankings | Computer Engineering: 34th best in the nation (20th among public schools) | An award-winning campus in a top-ranked city]
- Resources: [Advice on Research and Writing | Advice Collection | Best free OS textbook: OSTEP | Measure, Then Build: Tips on the Process of Systems Research] [Building Secure & Reliable Systems]
- Failures of real-world systems: [Amazon | Chameleon/OpenStack | Kyoto | Algolia | HPCC | OSC1 | OSC2] [Why Does the Cloud Stop Computing]
- Personal Gallery