Publications

Our researchers have been working for years on how to apply AI/ML to data systems and data systems to AI/ML Here’s a list of some of the papers that they’ve published.  Check back frequently for new publications coming out of the collaborative efforts of the DSAIL team.

 

2021

Yujun Lin, Mentian Yang, and Song Han. 2021. NAAS: Neural Accelerator Architecture Search. DAC 2021

 

Yujun LinZhekai Zhang, Haotian Tang, Hanrui Wang and Song Han. 2021. PointAcc: Efficient Point Cloud Accelerator. MICRO 2021

 

Zhijian Liu, Simon Stent, Jie Li, John Gideon and Song Han. 2021.  LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision. IEEE/CVF International Conference on Computer Vision (ICCV 2021)

 

Songtao He, Mohammad Ami Sadeghi, Sanjay Chawla, Mohammad Alizadeh, Hari Balakrishnan and Samuel Madden. 2021. Inferring High-Resolution Traffic Accident Risk Maps Based on Satellite Imagery and GPS Trajectories. IEEE/CVF International Conference on Computer Vision (ICCV 2021). Read the article in MIT CSAIL News.

 

Favyen Bastani and Sam Madden. 2021. Beyond Road Extraction: A Dataset for Map Update using Aerial Images. IEEE/CVF International Conference on Computer Vision (ICCV 2021)

 

Zhijian Liu, Haotian Tang, Sibo Zhu and Song Han. 2021. SemAlign: Annotation-Free Camera-LiDAR Calibration with Semantic Alignment Loss. IROS 2021

 

Jialin Ding, Vikram Nathan, Mohammad Alizadeh and Tim Kraska. 2021. Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. VLDB 2021.

 

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska and Thomas Neumann. 2021. SOSD: A Benchmark for Learned Indexes. VLDB 2021.

 

Parimarjan Negi, Ryan C. Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska and Mohammad Alizadeh. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. VLDB 2021.    

 

LADSIOS@VLDB Workshop: A Tutorial Workshop on Learned Algorithms, Data Structures and Instance-Optimized Systems. Stratos Idreos, Tim Kraska, and Umar Farooq Minhas, organizers. VLDB 2021.

 

Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf and Tim Kraska. 2021. When Are Learned Models Better Than Hash Functions. Applied AI for Database Systems and Applications Workshop (AIDB)@VLDB 2021.

 

Yi Lu, Xiangyao Yu, Lei Cao and Samuel Madden. 2021. Epoch-based Commit and Replication in Distributed OLTP Databases. VLDB 2021.

 

Huayi Zhang, Lei Cao, Samuel Madden and Elke Rundensteiner. 2021. LANCET: Labeling Complex Data at Scale. VLDB 2021.

 

Lei Cao, Samuel Madden and Elke Rundensteiner. 2021. ATLANTIC: Making Database Differentially Private and Faster with Accuracy Guarantee. VLDB 2021. (Demo)

 

Guoliang Li, Xuanhe Zhou and Lei Cao. Machine Learning for Databases. VLDB 2021. (Tutorial)

 

Keyulu Xu, Mozhi Zhang, Stefanie Jegelka and Kenji Kawaguchi. 2021. Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and Depth. International Conference on Machine Learning (ICML) 2021.

 

Peiyuan Liao, Han Zhao, Keyulu Xu, Tommi Jaakkola, Geoffrey Gordon, Stefanie Jegelka and Ruslan Salakhutdinov. 2021. Information Obfuscation of Graph Neural Networks. International Conference on Machine Learning (ICML) 2021.

 

Ryan C Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. ACM SIGMOD/PODS 2021.  (Data Management Best Paper Winner 2021) Watch the SIDMOD talk.

 

Parimarjan Negi (MIT CSAIL), Matteo Interlandi (Microsoft), Ryan Marcus (MIT CSAIL), Mohammad Alizadeh (Massachusetts Institute of Technology),Tim Kraska (MIT CSAIL), Marc Friedman (Microsoft) and Alekh Jindal (Microsoft). Steering Query Optimizers: A Practical Take on Big Data Workloads. ACM SIGMOD/PODS 2021. (Industry Honorable Mention 2021)

 

Tianyu Li,  Badrish Chandramouli, Jose Faleiro, Samuel Madden and Donald Kossmann. 2021. Asynchronous Prefix Recoverability for Fast Distributed Stores. ACM SIGMOD/PODS 2021.

 

Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. ACM SIGMOD/PODS 2021.

 

Lujing Cen, Andreas Kipf, Ryan Marcus and Tim Kraska. 2021. LEA: A Learned Encoding Advisor for Column Stores. aiDM@SIGMOD’21

 

Guoliang Li, Xuanhe Zhou and Lei Cao. Machine Learning for Databases AI meets Database: AI4DB and DB4AI. ACM SIGMOD/PODS 2021. (Tutorial)

 

Huayi Zhang, Lei Cao, Peter VanNostrand, Samuel Madden and Elke Rundensteiner. 2021. ELITE : Robust Deep Anomaly Detection with Meta Gradient. ACM SIGKDD 2021.

 

Laurent Bindschaedler, Jasmina Malicevic, Baptiste Lepers, Ashvin Goel and Willy Zwaenepoel. 2021. Tesseract: Distributed, General Graph Pattern Mining on Evolving Graphs. In Proceedings of the 16th European Conference on Computer Systems (EuroSys 2021).

 

Laurent Bindschaedler, Andreas Kipf, Tim Kraska, Ryan Marcus and Umar Farooq Minhas.  2021. Towards a Benchmark for Learned Systems. In Proceedings of the 12th International Workshop on Self-Managing Database Systems (2021).   

 

Kapil Vaidya, Eric Knorr, Tim Kraska and Michael Mitzenmacher. 2021. Partitioned Learned Bloom Filter. International Conference on Learning Representations (ICLR) 2021.

 

Keyulu Xu, Mozhi Zhang, Jingling Li, Simon Shaolei Du, Ken-Ichi Kawarabayashi and Stefanie Jegelka. 2021. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks. International Conference on Learning Representations (ICLR) 2021. (Oral Presentation)

 

Joshua David Robinson, Ching-Yao Chuang, Suvrit Sra and Stefanie Jegelka. 2021. Contrastive Learning with Hard Negative Samples. International Conference on Learning Representations (ICLR) 2021.

 

Hanrui Wang, Zhekai Zhang and Song Han. 2021.  SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA’21).

 

Eleni Tzirita Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy and Volker Markl. 2021. The Case for Distance-Bounded Spatial Approximations. CIDR 2021.

 

2020

Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan and Song Han. 2020. MCUNet: Tiny Deep Learning on IoT Devices.  Neural Information Processing Systems (NeurIPS) 2020. (Spotlight Presentation)

 

Han Cai, Chuang Gan, Ligeng Zhu and Song Han. 2020. Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning. Neural Information Processing Systems (NeurIPS) 2020.

 

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu and Song Han. 2020. Differentiable Augmentation for Data-Efficient GAN TrainingNeural Information Processing Systems (NeurIPS) 2020.

 

Hussam Abu-Libdeh, Deniz Altinbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou Li, Andy Ly, and Christopher Olston. 2020. Learned Indexes for a Google-scale Disk-based Database. Workshop on ML for Systems at NeurIPS 2020.

 

Ching-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba and Stefanie Jegelka. 2020. Debiased Contrastive Learning. Neural Information Processing Systems (NeurIPS) 2020.  (Spotlight Presentation)

 

Sebastian Curi, Kfir. Y. Levy, Stefanie Jegelka and Andreas Krause. 2020. Adaptive Sampling for Stochastic Risk-Averse Learning. Neural Information Processing Systems (NeurIPS) 2020.

 

Wenbo Tao, Xinli Hou, Adam Sah, Leilani Battle, Remco Chang and Michael Stonebraker. Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data. IEEE Information Visualization (InfoVis) at VIS 2020.

 

Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska and David Karger. 2020. ARDA: Automatic Relational Data Augmentation for Machine Learning. VLDB 2020.

 

Tim Kraska. 2020. Towards Learned Algorithms, Data Structures, and Systems. AIDB 2020 Workshop at VLDB 2020.

 

Michael Cafarella, David DeWitt, Vijay Gadepally, Jeremy Kepner, Christos Kozyrakis, Tim Kraska, Michael Stonebraker and Matei Zaharia. 2020. A Polystore-Based Database Operating System (DBOS). Polystore ‘20 Workshop at VLDB 2020.

 

Favyen Bastani, Oscar Moll and Samuel Madden. 2020. Vaas: Video Analytics at Scale. VLDB 2020. (Demo)

 

El Kindi Rezig, Ashrita Brahmaroutu, Nesime Tatbul, Mourad Ouzzani, Nan Tang, Timothy Mattson, Samuel Madden and Michael Stonebraker. 2020. Debugging Large-Scale Data Science Pipelines using Dagger. VLDB 2020. (Demo)

 

Yi Lu, Xiangyao Yu, Lei Cao and Samuel Madden. 2020. Aria: A Fast and Practical Deterministic OLTP Database. VLDB 2020.

 

Ching-Yao Chuang, Antonio Torralba and Stefanie Jegelka. 2020. Estimating Generalization under Distribution Shifts via Domain-Invariant Representations. International Conference on Machine Learning (ICML) 2020.

 

Lujing Cen, Ryan Marcus, Hongzi Mao, Justin Gottschlich, Mohammad Alizadeh and Tim Kraska. 2020. Learned Garbage Collection. MAPL 2020 at PLDI 2020.

 

Anil Shanbhag, Nesime Tatbul, David E. Cohen and Samuel Madden. 2020. Large-Scale In-Memory Analytics on Intel Optane DC Persistent Memory. 16th International Workshop on Data Management on New Hardware (DaMoN 2020) at 2020 ACM SIGMOD/PODS

 

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. aiDM 2020 Workshop at 2020 ACM SIGMOD/PODS.

 

Philipp Eichmann, Emanuel Zgraggen, Carsten Binnig and Tim Kraska. 2020. FASTBench: A New Benchmark for Interactive Data Exploration. 2020 ACM SIGMOD/PODS.

 

Vikram Nathan, Jialin Ding, Tim Kraska and Mohammad Alizadeh. 2020. Learned Multi-Dimensional Indexing. 2020 ACM SIGMOD/PODS.

 

Matthias Jasny, Tobias Ziegler, Tim Kraska, Uwe Roehm and Carsten Binnig. 2020. DB4ML: An In-Memory Database Kernel with Machine Learning Support. 2020 ACM SIGMOD/PODS.

 

Matthew Perron, Raul Castro Fernandez, David DeWitt and Samuel Madden. 2020. Starling: A Scalable Query Engine on Cloud Function Services. 2020 ACM SIGMOD/PODS.

 

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Hantian Zhang, Yinan Li, Jaeyoung Do, Donald Kossmann, Johannes Gehrke, David Lomet, Badrish Chandramouli and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index2020 ACM SIGMOD/PODS.  Get the code

 

Ryan Marcus, Emily Zhang and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. 2020 ACM SIGMOD/PODS. (demo paper)

 

Anil Shanbhag, Samuel Madden and Xiangyao Yu.  2020. A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. 2020 ACM SIGMOD/PODS.

 

Favyen Bastani, Songtao He, Arjun Balasingam, Karthik Gopalakrishnan, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska and Sam Madden. 2020. MIRIS: Fast Object Track Queries in Video. 2020 ACM SIGMOD/PODS.

 

Chengliang Chai, Lei Cao, Guoliang Li, Jian Li, Yuyu Luo and Samuel Madden. 2020. Human-in-the-Loop Outlier Detection. 2020 ACM SIGMOD/PODS.

 

Erfan Zamanian, Julian Shun, Carsten Binnig and Tim Kraska. 2020. Chiller: Contention-centric Transaction Execution and Data Partitioning for Modern Networks. 2020 ACM SIGMOD/PODS.

 

Lei Cao, Huayi Zhang, Yizhou Yan, Samuel Madden and Elke A. Rundensteiner. 2020. Continuously Adaptive Similarity Search.  2020 ACM SIGMOD/PODS

 

Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang and Song Han. 2020. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. ECCV’20.

 

Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu and Song Han. 2020. DataMix: Efficient Privacy-Preserving Edge-Cloud Inference. ECCV’20.

 

Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan and Song Han. 2020. Hardware-Aware Transformers for Efficient Natural Language Processing. ACL’20Talk at Microsoft Research.

 

Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu and Song Han. 2020. GAN Compression: Learning Efficient Architectures for Conditional GANs. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

 

Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters.  Self-Managing Database Systems 2020 (SMDB 2020).

 

Johannes Kirschner, Ilija Bogunovic, Stefanie Jegelka and Andreas Krause. 2020. Distributionally Robust Bayesian Optimization. International Conference on Artificial Intelligence and Statistics (AISTATS) 2020.

 

Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi and Stefanie Jegelka. 2020. What Can Neural Networks Reason About? International Conference on Learning Representations 2020 (ICLR 2020).(Spotlight).

 

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang and Song Han. 2020. Once For All: Train One Network and Specialize It for Efficient Deployment. International Conference on Learning Representations 2020 (ICLR 2020).

 

Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin and Song Han. 2020. Lite Transformer with Long Short Term Attention. International Conference on Learning Representations 2020 (ICLR 2020).

 

Zhekai Zhang, Hanrui Wang, Song Han and William J. Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. International Symposium on High-Performance Computer Architecture (HPCA) 2020.

 

Vikram Nathan, Jialin Ding, Mohammad Alizadeh and Tim Kraska. 2020. Learning Multi-dimensional Indexes. Northeast Database Day 2020.

 

Ryan Marcus and Tim Kraska. 2020. Learning to Multiplex Simple Query Optimizers. Northeast Database Day 2020.

 

Matthew J. Perron, Raul Castro Fernandez, David DeWitt and Samuel Madden. 2020. Starling: How to Build a Query Engine on Cloud Functions. 2020. Northeast Database Day 2020.

 

Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md and Tim Kraska. 2020. LISA: Towards Learned DNA Sequence Search. Northeast Database Day 2020 (poster).

 

Mengyuan Sun, Joana M. F. da Trindade, Samuel Madden, Julian Shun and Nesime Tatbul. 2020  In-memory Graph Partitioning for Efficient Temporal Graph Analytics on NVRAM. Northeast Database Day 2020 (poster).

 

Erfan Zamanian, Xiangyao Yu, Michael Stonebraker and Tim Kraska. 2020. Rethinking Database High Availability with RDMA Networks. Northeast Database Day 2020 (poster).

 

El Kindi Rezig, Lei Cao, Giovanni Simonini, Maxime Schoemans, Samuel Madden, Nan Tang, Mourad Ouzzani and Michael Stonebraker. (2020) Dagger: A Data (not code) Debugger. Conference on Innovative Data Systems (CIDR) 2020.

 

2019

Hongzi Mao, Malte Schwarzkopf, Hao He and Mohammad Alizadeh. 2019. Towards Safe Online Reinforcement Learning in Computer Systems.  Machine Learning for Systems Workshop at Neural Information Processing Systems (NeurIPS) 2019.   

 

Vikram Nathan. Learned Multi-dimensional Indexing. Machine Learning for Systems Workshop at Neural Information Processing Systems (NeurIPS) 2019. (Contributed talk).

 

Haonan Wang, Hao He, Mohammad Alizadeh and Hongzi Mao. 2019. Learning Caching Policies with Subsampling. Machine Learning for Systems Workshop at Neural Information Processing Systems (NeurIPS) 2019.   

 

Zeyuan Shang, Emanuel Zgraggen and Tim Kraska. (2019) Alpine Meadow: A System for Interactive AutoML. MLSys: Workshop on Systems for ML at Neural Information Processing Systems (NeurIPS) 2019.

 

Zeyuan Shang, Emanuel Zgraggen, Philipp Eichmann and Tim Kraska. 2019. Niseko: a Large-Scale Meta-Learning Dataset. Workshop on Meta-Learning, Workshop on Systems for ML at Neural Information Processing Systems (NeurIPS) 2019.

 

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. MLSys: Workshop on Systems for ML at Neural Information Processing Systems (NeurIPS) 2019. [GitHub]

 

Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md and Tim Kraska. 2019. LISA: Towards Learned DNA Sequence Search. MLSys: Workshop on Systems for ML at Neural Information Processing Systems (NeurIPS) 2019. (selected for oral presentation)

 

Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao and Mohammad Alizadeh. 2019. Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. Neural Information Processing Systems (NeurIPS) 2019.

 

Zhijian Liu, Haotian Tang, Yujun Lin and Song Han. 2019. Point-Voxel CNN for Efficient 3D Deep Learning. Neural Information Processing Systems (NeurIPS) 2019.

 

Ligeng Zhu, Zhijian Liu and Song Han. 2019. Deep Leakage from Gradients. Neural Information Processing Systems (NeurIPS) 2019.

 

Joshua Robinson, Suvrit Sra and Stefanie Jegelka. 2019. Flexible Modeling of Diversity with Strongly Log-Concave Distributions. Neural Information Processing Systems (NeurIPS) 2019.

 

Matthew Staib and Stefanie Jegelka. 2019. Distributionally Robust Optimization and Generalization in Kernel Methods. Neural Information Processing Systems (NeurIPS) 2019.

 

Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Bojja Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska and Mohammad Alizadeh. 2019. Park: An Open Platform for Learning-Augmented Computer Systems. Neural Information Processing Systems (NeurIPS) 2019. [code] [blog post]

 

Ji Lin, Chuang Gan, Song Han. 2019. TSM: Temporal Shift Module for Efficient Video UnderstandingInternational Conference on Computer Vision (ICCV). [paper][demo][code][industry integration]

 

Lei Cao, Wenbo Tao, Sungtae An, Jing Jin (Massachusetts General Hospital), Yizhou Yan, Xiaoyu Liu, Wendong Ge (Massachusetts General Hospital), Adam Sah, Leilani Battle, Jimeng Sun, Remco Chang, Brandon Westover (Massachusetts General Hospital), Samuel Madden, Michael Stonebraker. 2019. Smile: A System to Support Machine Learning on EEG Data at Scale, VLDB 2019 (Industry Track paper)

 

El Kindi Rezig, Lei Cao, Michael Stonebraker, Giovanni Simonini, Wenbo Tao, Samuel Madden, Mourad Ouzzani, Nan Tang, Ahmed Elmagarmid. 2019. Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics. VLDB 2019. (Demo paper)

 

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang,  Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, Nesime Tatbul.  2019. “Neo: A Learned Query Optimizer.” VLDB 2019. [pdf]

 

Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Ferdinand Kossmann, Philipp Eichmann, Yeounoh Chun, Carsten Binnig, Eli Upfal, Tim Kraska. 2019.  Democratizing Data Science through Interactive Curation of ML Pipelines.  In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 1171-1188. DOI

 

Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca,Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In arXiv, 2019. [project page] [pdf]

 

Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca ,Tim Kraska. 2019. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 741-758. DOI

 

Stratos Idreos and Tim Kraska. 2019. From Auto Tuning One Size Fits All to Self-Designed and Learned Data Intensive Systems.  In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 2054-2059. DOI

 

El Kindi Rezig, Mourad Ouzzani, Ahmed K. Elmagarmid, Walid G. Aref and Michael Stonebraker. 2019. Towards an End-to-End Human-Centric Data Cleaning Framework. HILDA@SIGMOD 2019: 1:1-1:7

 

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. SIGCOMM 2019: 270-288. [project page]

 

Vikram Nathan, Vibhaalakshmi Sivaraman, Ravichandra Addanki, Mehrdad Khani, Prateesh Goyal, Mohammad Alizadeh. 2019.  End-to-End Transport for Video QoE Fairness.” SIGCOMM 2019.

 

Hongzi Mao, Akshay Narayan, Parimarjan Negi, Hanrui Wang, Jiacheng Yang, Haonan Wang, Mehrdad Khani, Songtao He, Ravichandra Addanki, Ryan Marcus, Frank Cangialosi, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh. 2019. Park: An Open Platform for Learning Augmented Computer Systems. RL for Real Life ICML 2019 Workshop (Best paper award).

 

Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. 2019.  Real-world Video Adaptation with Reinforcement Learning. RL for Real Life ICML 2019 Workshop.

 

Amy Zhao, Guha Balakrishnan, Fredo Durand, John V. Guttag, Adrian V. Dalca. 2019. Data Augmentation Using Learned Transformations for One-Shot Medical Image Segmentation.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8543-8553

 

Frederik D. Johansson, Rajesh Ranganath, David Sontag. 2019. Support and Invertibility in Domain-Invariant Representations. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (AI-STATS) (To appear), 2019.

 

Omer Gottesman, Fredrik D. Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, Leo Anthony Celi. 2019. Guidelines for reinforcement learning in healthcare. Nature Medicine, 25(1): 16–18. 2019.

 

Lei Cao, Yizhou Yan, Samuel Madden, Elke Rundensteiner, Mathan Gopalsamy. 2019. Efficient Discovery of Sequence Outlier Patterns. PVLDB, 12(8): 920-932, 2019

 

Charlotte Bunne, David Alvarez-Melis, Andreas Krause, Stefanie Jegelka. Learning Generative Models across Incomparable Spaces. 2019. International Conference on Machine Learning (ICML), 2019.

 

Wenbo Tao, Xiaoyu Liu, Yedi Wang, Leilani Battle, Cagatay Demiralp, Remco Chang  Michael Stonebraker. Kyrix: Interactive Pan/Zoom Visualizations at Scale. 2019. Eurographics Conference on Visualization (EuroVis) 2019.

 

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han. HAQ: Hardware-Aware Automated Quantization with Mixed Precision. 2019. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. (Oral presentation)

 

Matthew Staib, Bryan Wilder, Stefanie Jegelka. Distributionally Robust Submodular Maximization. 2019. International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

 

Kevin Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, César Hidalgo. VizML: A Machine Learning Approach to Visualization Recommendation. 2019. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), 2019. Read a summary.

 

Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka. How Powerful are Graph Neural Networks? 2019. International Conference on Learning Representations (ICLR), 2019. (Oral Presentation)

 

Yeounoh Chung, Tim Kraska, Steven Euijong Whang, Neoklis Polyzotis. Slice Finder: Automated Data Slicing for Model Validation. 2019. IEEE International Conference on Data Engineering (ICDE), 2019.  (Short paper version to appear.)

 

Han Cai, Ligeng Zhu, Song Han. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. 2019. To appear in ICLR’19 (Seventh International Conference on Learning Representations).

 

Ji Lin, Chuang Gan, Song Han. Defensive Quantization: When Efficiency Meets Robustness. 2019. To appear in ICLR’19  (Seventh International Conference on Learning Representations)

 

Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf, Mohammad Alizadeh. Variance Reduction for Reinforcement Learning in Input-Driven Environments. 2019. To appear in ICLR‘19 (Seventh International Conference on Learning Representations).

 

Raul Castro Fernandez and Samuel Madden. Multi-Modal Data Exploration with a Learned Relational Embedding. North East Database Day 2019.

 

Ryan Marcus and Olga Papaemmanouil. Making ML-for-DB Easier: Operator Embeddings via Deep Learning. North East Database Day 2019.

 

Jialin Ding. Learned Multi-Dimensional Index for Data Warehouses. North East Database Day 2019. (Poster)

 

Leonhard Spiegelberg and Tim Kraska. Robust Data Centric Code Generation. North East Database Day 2019. (Poster)

 

Ani Kristo, Kapil Vaidya, Ugur Cetintemel, Tim Kraska. A Learned Sorting Algorithm. North East Database Day 2019. (Poster)

 

Manasi Vartak. Data Science is Growing Up: Building the DS/ML Infrastructure Backbone. North East Database Day 2019. (Poster)

 

Wenbo Tao, Xiaoyu Liu, Cagatay Demiralp, Remco Chang, Michael Stonebraker. Kyrix: Interactive Visual Data Exploration at Scale. 2019. Conference on Innovative Data Systems Research (CIDR) 2019. 

 

Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed Chi, Ani Kristo, Guillaume LeclercSamuel Madden, Hongzi Mao, Vikram Nathan. SageDB: A Learned Database System. 2019. Conference on Innovative Data Systems Research (CIDR) 2019.

 

2018

Songtao He, Favyen Bastani, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla , Samuel Madden. RoadRunner: Improving the Precision of Road Network Inference from GPS Trajectories. 2018. ACM SIGSPATIAL, Seattle, WA, November 2018 [PDF] [BibTex]

 

Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Samuel Madden. Machine-Assisted Map Editing. 2018. ACM SIGSPATIAL, Seattle, WA, November 2018. [PDF] [BibTex]

 

Irene Chen, Fredrik D. Johansson, David Sontag. Why is My Classifier Discriminatory? 2018. Neural Information Processing Systems (NeurIPS), 2018. SpotlightWatch video

 

Hongzou Lin and Stefanie Jegelka. ResNet with one-neuron hidden layers is a Universal Approximator. 2018. Neural Information Processing Systems (NeurIPS), 2018. Spotlight.

 

Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, Volkan Cevher. 2018. Adversarially Robust Optimization with Gaussian Processes. Neural Information Processing Systems (NeurIPS), 2018. Spotlight.

 

Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken Kawarabayashi, Stefanie Jegelka. 2018.  Representation Learning on Graphs with Jumping Knowledge Networks. 2018. International Conference on Machine Learning (ICML), 2018. Long talk.

 

Harini Suresh, Jen J. Gong, John V. Guttag. 2018. Learning Tasks for Multitask Learning: Heterogeneous Patient Populations in the ICU. Proceedings of the Knowledge Discovery and Data Mining Conference (KDD 2018), 2018.  

 

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18).

 

Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, Matei Zaharia. 2018. MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis.  In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18).

 

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. ECCV’18 (European Conference on Computer Vision).

 

Michael L. Brodie, Michael Stonebraker, Ricardo Mayerhofer, Jialing Pei. The Case for the Co-evolution of Applications and Data, North East Database Day 2018 (NEDB 2018), January 19, 2018.

 

Yizhou Yan, Lei Cao, Samuel Madden, Elke Rundensteiner. 2018. SWIFT: Mining Representative Patterns from Large Event Streams. PVLDB, 12(3): 265-277, 2018.

 

Tim Kraska. Northstar: An Interactive Data Science System. 2018. PVLDB 11(12): 2150-2164 (2018)

 

Carsten Binnig, Benedetto Buratti, Yeounoh Chung, Cyrus Cousins, Tim Kraska, Zeyuan Shang, Eli Upfal, Robert C. Zeleznik, Emanuel Zgraggen. 2018. Towards Interactive Curation & Automatic Tuning of ML Pipelines. DEEM@SIGMOD 2018: 1:1-1:4