Publications

Our researchers have been working for years on how to apply AI/ML to data systems and data systems to AI/ML Here’s a list of some of the papers that they’ve published.  Check back frequently for new publications coming out of the collaborative efforts of the DSAIL team.

 

2019

Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md and Tim Kraska. 2019. LISA: Towards Learned DNA Sequence Search. Workshop on Systems for ML at Neural Information Processing Systems (NeurIPS) 2019 (to appear).

 

Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao and Mohammad Alizadeh. 2019. Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. Neural Information Processing Systems (NeurIPS) 2019.

 

Zhijian Liu, Haotian Tang, Yujun Lin and Song Han. 2019. Point-Voxel CNN for Efficient 3D Deep Learning. Neural Information Processing Systems (NeurIPS) 2019.

 

Ligeng Zhu, Zhijian Liu and Song Han. 2019. Deep Leakage from Gradients. Neural Information Processing Systems (NeurIPS) 2019.

 

Joshua Robinson, Suvrit Sra and Stefanie Jegelka. 2019. Flexible Modeling of Diversity with Strongly Log-Concave Distributions. Neural Information Processing Systems (NeurIPS) 2019.

 

Matthew Staib and Stefanie Jegelka. 2019. Distributionally Robust Optimization and Generalization in Kernel Methods. Neural Information Processing Systems (NeurIPS) 2019.

 

Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Bojja Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska and Mohammad Alizadeh. 2019. Park: An Open Platform for Learning-Augmented Computer Systems. Neural Information Processing Systems (NeurIPS) 2019. [code] [blog post]

 

Ji Lin, Chuang Gan, Song Han. 2019. TSM: Temporal Shift Module for Efficient Video UnderstandingInternational Conference on Computer Vision (ICCV). [paper][demo][code][industry integration]

 

Lei Cao, Wenbo Tao, Sungtae An, Jing Jin (Massachusetts General Hospital), Yizhou Yan, Xiaoyu Liu, Wendong Ge (Massachusetts General Hospital), Adam Sah, Leilani Battle, Jimeng Sun, Remco Chang, Brandon Westover (Massachusetts General Hospital), Samuel Madden, Michael Stonebraker. 2019. Smile: A System to Support Machine Learning on EEG Data at Scale, VLDB 2019 (Industry Track paper)

 

El Kindi Rezig, Lei Cao, Michael Stonebraker, Giovanni Simonini, Wenbo Tao, Samuel Madden, Mourad Ouzzani, Nan Tang, Ahmed Elmagarmid. 2019. Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics. VLDB 2019. (Demo paper)

 

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang,  Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, Nesime Tatbul.  2019. “Neo: A Learned Query Optimizer.” VLDB 2019. [pdf]

 

Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Ferdinand Kossmann, Philipp Eichmann, Yeounoh Chun, Carsten Binnig, Eli Upfal, Tim Kraska. 2019.  Democratizing Data Science through Interactive Curation of ML Pipelines.  In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 1171-1188. DOI

 

Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca,Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In arXiv, 2019. [project page] [pdf]

 

Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca ,Tim Kraska. 2019. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 741-758. DOI

 

Stratos Idreos and Tim Kraska. 2019. From Auto Tuning One Size Fits All to Self-Designed and Learned Data Intensive Systems.  In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 2054-2059. DOI

 

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. SIGCOMM 2019: 270-288. [project page]

 

Vikram Nathan, Vibhaalakshmi Sivaraman, Ravichandra Addanki, Mehrdad Khani, Prateesh Goyal, Mohammad Alizadeh. 2019.  End-to-End Transport for Video QoE Fairness.” SIGCOMM 2019.

 

Hongzi Mao, Akshay Narayan, Parimarjan Negi, Hanrui Wang, Jiacheng Yang, Haonan Wang, Mehrdad Khani, Songtao He, Ravichandra Addanki, Ryan Marcus, Frank Cangialosi, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh. 2019. Park: An Open Platform for Learning Augmented Computer Systems. RL for Real Life ICML 2019 Workshop (Best paper award).

 

Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. 2019.  Real-world Video Adaptation with Reinforcement Learning. RL for Real Life ICML 2019 Workshop.

 

Amy Zhao, Guha Balakrishnan, Fredo Durand, John V. Guttag, Adrian V. Dalca. 2019. Data Augmentation Using Learned Transformations for One-Shot Medical Image Segmentation.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8543-8553

 

Frederik D. Johansson, Rajesh Ranganath, David Sontag. 2019. Support and Invertibility in Domain-Invariant Representations. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (AI-STATS) (To appear), 2019.

 

Omer Gottesman, Fredrik D. Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, Leo Anthony Celi. 2019. Guidelines for reinforcement learning in healthcare. Nature Medicine, 25(1): 16–18. 2019.

 

Lei Cao, Yizhou Yan, Samuel Madden, Elke Rundensteiner, Mathan Gopalsamy. 2019. Efficient Discovery of Sequence Outlier Patterns. PVLDB, 12(8): 920-932, 2019

 

Charlotte Bunne, David Alvarez-Melis, Andreas Krause, Stefanie Jegelka. Learning Generative Models across Incomparable Spaces. 2019. International Conference on Machine Learning (ICML), 2019.

 

Wenbo Tao, Xiaoyu Liu, Yedi Wang, Leilani Battle, Cagatay Demiralp, Remco Chang  Michael Stonebraker. Kyrix: Interactive Pan/Zoom Visualizations at Scale. 2019. Eurographics Conference on Visualization (EuroVis) 2019.

 

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han. HAQ: Hardware-Aware Automated Quantization with Mixed Precision. 2019. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. (Oral presentation)

 

Matthew Staib, Bryan Wilder, Stefanie Jegelka. Distributionally Robust Submodular Maximization. 2019. International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

 

Kevin Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, César Hidalgo. VizML: A Machine Learning Approach to Visualization Recommendation. 2019. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), 2019. Read a summary.

 

Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka. How Powerful are Graph Neural Networks? 2019. International Conference on Learning Representations (ICLR), 2019. (Oral Presentation)

 

Yeounoh Chung, Tim Kraska, Steven Euijong Whang, Neoklis Polyzotis. Slice Finder: Automated Data Slicing for Model Validation. 2019. IEEE International Conference on Data Engineering (ICDE), 2019.  (Short paper version to appear.)

 

Han Cai, Ligeng Zhu, Song Han. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. 2019. To appear in ICLR’19 (Seventh International Conference on Learning Representations).

 

Ji Lin, Chuang Gan, Song Han. Defensive Quantization: When Efficiency Meets Robustness. 2019. To appear in ICLR’19  (Seventh International Conference on Learning Representations)

 

Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf, Mohammad Alizadeh. Variance Reduction for Reinforcement Learning in Input-Driven Environments. 2019. To appear in ICLR‘19 (Seventh International Conference on Learning Representations).

 

Raul Castro Fernandez and Samuel Madden. Multi-Modal Data Exploration with a Learned Relational Embedding. North East Database Day 2019.

 

Ryan Marcus and Olga Papaemmanouil. Making ML-for-DB Easier: Operator Embeddings via Deep Learning. North East Database Day 2019.

 

Jialin Ding. Learned Multi-Dimensional Index for Data Warehouses. North East Database Day 2019. (Poster)

 

Leonhard Spiegelberg and Tim Kraska. Robust Data Centric Code Generation. North East Database Day 2019. (Poster)

 

Ani Kristo, Kapil Vaidya, Ugur Cetintemel, Tim Kraska. A Learned Sorting Algorithm. North East Database Day 2019. (Poster)

 

Manasi Vartak. Data Science is Growing Up: Building the DS/ML Infrastructure Backbone. North East Database Day 2019. (Poster)

 

Wenbo Tao, Xiaoyu Liu, Cagatay Demiralp, Remco Chang, Michael Stonebraker. Kyrix: Interactive Visual Data Exploration at Scale. 2019. Conference on Innovative Data Systems Research (CIDR) 2019. 

 

Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed Chi, Ani Kristo, Guillaume LeclercSamuel Madden, Hongzi Mao, Vikram Nathan. SageDB: A Learned Database System. 2019. Conference on Innovative Data Systems Research (CIDR) 2019.

 

2018

Songtao He, Favyen Bastani, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla , Samuel Madden. RoadRunner: Improving the Precision of Road Network Inference from GPS Trajectories. 2018. ACM SIGSPATIAL, Seattle, WA, November 2018 [PDF] [BibTex]

 

Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Samuel Madden. Machine-Assisted Map Editing. 2018. ACM SIGSPATIAL, Seattle, WA, November 2018. [PDF] [BibTex]

 

Irene Chen, Fredrik D. Johansson, David Sontag. Why is My Classifier Discriminatory? 2018. Neural Information Processing Systems (NeurIPS), 2018. SpotlightWatch video

 

Hongzou Lin and Stefanie Jegelka. ResNet with one-neuron hidden layers is a Universal Approximator. 2018. Neural Information Processing Systems (NeurIPS), 2018. Spotlight.

 

Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, Volkan Cevher. 2018. Adversarially Robust Optimization with Gaussian Processes. Neural Information Processing Systems (NeurIPS), 2018. Spotlight.

 

Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken Kawarabayashi, Stefanie Jegelka. 2018.  Representation Learning on Graphs with Jumping Knowledge Networks. 2018. International Conference on Machine Learning (ICML), 2018. Long talk.

 

Harini Suresh, Jen J. Gong, John V. Guttag. 2018. Learning Tasks for Multitask Learning: Heterogeneous Patient Populations in the ICU. Proceedings of the Knowledge Discovery and Data Mining Conference (KDD 2018), 2018.  

 

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18).

 

Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, Matei Zaharia. 2018. MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis.  In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18).

 

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. ECCV’18 (European Conference on Computer Vision).

 

Michael L. Brodie, Michael Stonebraker, Ricardo Mayerhofer, Jialing Pei. The Case for the Co-evolution of Applications and Data, North East Database Day 2018 (NEDB 2018), January 19, 2018.

 

Yizhou Yan, Lei Cao, Samuel Madden, Elke Rundensteiner. 2018. SWIFT: Mining Representative Patterns from Large Event Streams. PVLDB, 12(3): 265-277, 2018.

 

Tim Kraska. Northstar: An Interactive Data Science System. 2018. PVLDB 11(12): 2150-2164 (2018)

 

Carsten Binnig, Benedetto Buratti, Yeounoh Chung, Cyrus Cousins, Tim Kraska, Zeyuan Shang, Eli Upfal, Robert C. Zeleznik, Emanuel Zgraggen. 2018. Towards Interactive Curation & Automatic Tuning of ML Pipelines. DEEM@SIGMOD 2018: 1:1-1:4

 

2017

Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Nan Tang. 2017. “The Data Civilizer System.” CIDR 2017.

 

Michael Stonebraker, Dong Deng, Michael L. Brodie. 2017. Application-Database Co-Evolution: A New Design and Development Paradigm. New England Database Day, (pp. 1–3). January 2017.

 

2016

Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, Matei Zaharia. 2016. MODELDB: A System for Machine Learning Model Management. HILDA 2016.

 

Michael Stonebraker, Dong Deng, Michael L. Brodie. 2016. Database Decay and How to Avoid It (pp. 1–10). Proceedings of the IEEE International Conference on Big Data, Washington, DC.  2016.

 

2015

Evan R. Sparks, Ameet Talwalkar, Michael J. Franklin, Michael I. Jordan, Tim Kraska. 2015. TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries. CoRR abs/1502.000068 2015.

 

2013

Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, Michael I. Jordan. 2013. MLbase: A Distributed Machine-learning System. CIDR 2013.