Applied Data Science Invited Speakers
Ricard Gavalda
Tuesday, August 27th, 11:00 AM – 1:00 PM
Bio: Ricard Gavaldà is a full professor, currently on leave, at the Department of Computer Science of the Universitat Politècnica de Catalunya – BarcelonaTech (UPC). His research for over 20 years has focused on Machine Learning, both in theoretical and algorithmic aspects and in applications to real scenarios. On the algorithmic side he has worked on stream-mining techniques and on frequent pattern discovery. On the more applied side, from UPC he has led or participated in knowledge-transfer projects in domains such as social media analysis, smart cities, fraud detection, customer churning, and energy efficiency.
Since the mid-2010s, his main interest has been in the application of Machine Learning and complex pattern analysis to data from healthcare systems, in order to make them more efficient, fair, and safe. In 2017 he co-founded the startup Amalfi Analytics, whose aim is to transform the practice of clinical management by exploiting existing data, and where he has been working full-time since March 2020. Amalfi Analytics provides advanced analytic tools that allow clinicians and managers in healthcare organizations to make better decisions by analyzing trade-offs between resources, risks, and clinical outcomes involving patients, staff, and healthcare organizations as a whole.
Alex Jaimes
Tuesday, August 27th, 11:00 AM – 1:00 PM
Bio: Alex is Chief AI Officer at Dataminr where he’s in charge of Engineering, Data, and AI. Alex is a leader in AI and as an Engineering executive and scientist has built and led AI teams at large companies such as Yahoo and at several startups, where he has led efforts to build AI products used by millions of people across multiple B2C and B2B industries (real-time event detection/emergency response, healthcare, self-driving cars, media, telecomm, etc.). He has 20+ years of intl. experience in research (Columbia U., KAIST) and product impact at scale (Yahoo, Telefónica, IBM, Fuji Xerox, Siemens, AT&T Bell Labs, DigitalOcean, and IDIAP-EPFL) in the USA, Japan, Chile, Switzerland, Spain, and South Korea. He has been a professor (KAIST, South Korea), and has 100+ patents and publications (h index 43) in top tier conferences and journals in diverse topics in AI. His work has received ~9K citations and he has been featured widely in the press (MIT Tech review, CNBC, Vice, TechCrunch, Yahoo! Finance, etc.). He has given 100+ invited talks at the top academic and industry conferences (Colombia 4.0, UN AI for Good Global Summit, ICML & NeurIPs workshops, KDD, O’Reilly AI, Strata, Velocity, the Deep Learning Summit (Re-Work), Tech Open Air, the Future of Technology Summit, CogX, Stanford, Cornell, & Columbia Universities, etc.). He is a mentor at Endeavor (which leads the high impact entrepreneurship movement around the world) and Techstars; he was a member of the advisory board of Digital Divide Data (a non-for profit that creates sustainable tech opportunities for underserved youth, their families, and their communities in Asia and Africa), and was an early voice in Human-Centered AI (Computing). He was one of ten experts in the Colombian Government’s Artificial Intelligence Expert Mission, which evaluated and produced concrete recommendations in the short, medium and long term to implement an AI Policy. Alex is an active member of the research community (publishing and being in the program committee of several top-tier conferences). He holds a Ph.D. and an M.S. from Columbia University.
Michael May
Tuesday, August 27th, 2:00 PM – 4:00 PM
Title: Research Challenges in Industrial AI – from engineering design to shopfloor operations
Abstract: The field of Industrial AI creates a number of unique research & application challenges. A first cluster is centered around the combination of data-driven machine learning and physics-based methods. E.g. we can speed up automotive or turbine design based on traditional simulation sometimes by orders of magnitude using ML-based surrogate models, or by reducing the search space using Bayesian Learning. A second cluster centers around problems of operations & control, e.g. on the shopfloor. Examples are visual quality inspection and many forms of anomaly detection. In this area, Generative AI promises to break a number of currently existing barriers for the scalability and transferability of solutions. A third important dimension is the need for safe & reliable or “industrial grade” AI. In my presentation I will discuss examples from various industrial domains and point towards some open challenges.
Bio: Dr. Michael May is leading the Industrial Data Analytics & AI Research at Siemens AG. He is based in Munich and heading a global AI department with fourteen research groups in Europe, US, and Asia. Before joining Siemens in 2013, Michael was Head of the Knowledge Discovery Department at the Fraunhofer Institute for Intelligent Analysis and Information Systems in Bonn, Germany. He was responsible for ML projects with different sectors of German Industry, from telecommunications to automotive and finance, and leading a number of EU-wide Research Networks of Excellence on Machine Learning and Knowledge Discovery. He obtained his PhD from the Graduate Program Cognitive Science at the University of Hamburg in 1997. Michael was the local chair of the 22nd International Conference on Machine Learning, ICML 2005, in Bonn, as well as further scientific and industrial conferences & workshops in the field.
Pedro Bizarro
Tuesday, August 27th, 2:00 PM – 4:00 PM
Title: Lessons learned while running ML models in harsh environments
Abstract: Once a very large payment processor client told us: ‘if we are down for 5 minutes, we open the evening news – so don’t screw up’. Processing billions of dollars per day, many financial institutions need to continuously fight organized crime in the form of transaction fraud, stolen cards, anti-money laundering, account opening fraud, impersonations scams, phishing, and many other exotic and ever changing attacks from organized crime groups worldwide. In fact, it is estimated that in 2023 the global losses in fraud scams and bank fraud reached $485.6 billion. However, in addition to having very good detection rates and very low false positive rates, financial institutions also need to maintain very high availability rates, very low latencies, very high throughputs, automatic fault tolerance, auto scale up and down, and more. In this talk we cover some lessons related to running ML models in harsh, mission critical environments. We describe data issues, scale issues, ethical issues, system issues, security issues, compliance issues, business and regulation issues, and some architectural tradeoffs and architectural evolutions.
Bio: Pedro Bizarro is co-founder and Chief Science Officer of Feedzai where he leads the Research department. Drawing on a history in academia and research, Pedro helped to develop Feedzai’s industry-leading RiskOps platform to fight financial fraud using innovations from Research. Pedro is also an invited Visiting Professor at Universidade de Lisboa – IST, Member of the Global Innovator Programme at the World Economic Forum, has been an Assistant Professor at the University of Coimbra, and visiting professor at Carnegie Mellon University, a Fulbright Fellow, and holds a Computer Science PhD from the University of Wisconsin-Madison. Pedro’s main interests are high performance systems for data processing, machine learning, responsible AI, and data visualization. Pedro is also an avid runner and an Ironman.
Haixun Wang
Wednesday, August 28th, 11:00 AM – 1:00 PM
Bio: Haixun Wang is an IEEE Fellow, Editor-in-Chief of the IEEE Data Engineering Bulletin, and VP of Engineering and Distinguished Scientist at Instacart. Previously, he held similar roles at WeWork and Amazon, and led the NLP team at Facebook. From 2013 to 2015, he worked on NLP at Google Research. Earlier, at Microsoft Research Asia, he led research in semantic search and graph data processing from 2009 to 2013. At IBM T. J. Watson Research Center from 2000 to 2009, he was a research staff member, and a technical assistant to VP of IBM Research. He earned his Ph.D. in Computer Science from UCLA in 2000, and has published over 200 research papers. He has chaired conferences like SIGKDD ‘ 1 and serves on editorial boards for TKDE, etc. His awards include ICDE 10-Year Influential Paper in 2024, Best Paper at ICDE 2015, ICDM 10-Year Best Paper in 2013, and Best Paper at ER 2009.
Luna Dong
Wednesday, August 28th, 11:00 AM – 1:00 PM
Bio: Xin Luna Dong is a Principal Scientist at Meta Reality Labs, leading the ML efforts in building an intelligent personal assistant. She has spent more than a decade building knowledge graphs, such as the Amazon Product Graph and the Google Knowledge Graph. She has co-authored books “Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases” and “Big Data Integration”. She was named an ACM Fellow and an IEEE Fellow for “significant contributions to knowledge graph construction and data integration”, awarded the VLDB Women in Database Research Award and VLDB Early Career Research Contribution Award. She serves in the PVLDB advisory committee, was a member of the VLDB endowment, a PC co-chair for KDD’2022 ADS track, WSDM’2022, VLDB’2021, and Sigmod’2018.
Hema Raghavan
Wednesday, August 28th, 2:00 PM – 4:00 PM
Title: Scalable Graph Learning for your Enterprise
Abstract: Much of the world’s most valued data is stored in relational databases and data warehouses, where the data is organized into many tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming. The core problem is that no machine learning method is capable of learning on multiple tables interconnected by primary-foreign key relations. Current methods can only learn from a single table, so the data must first be manually joined and aggregated into a single training table, the process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. At Kumo we have developed an end-to-end deep representation learning approach to directly learn on data laid out across multiple tables. We name our approach Relational Deep Learning (RDL). The core idea is to view relational databases as a temporal, heterogeneous graph, with a node for each row in each table, and edges specified by primary-foreign key links. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all input data, without any manual feature engineering. Relational Deep Learning leads to more accurate models that can be built much faster. Furthermore, we have found that once the graph is constructed, almost all machine learning problems in enterprises can be reduced to node prediction or link prediction problems. This observation led us to create predictive SQL (pSQL) a very simple DSL that is extremely expressive and has a SQL like syntax that a data scientist can use to define a predictive problem on the graph. pSQL has been successfully used by several enterprises to iterate quickly on graph learning for problems ranging from recommender systems, marketing and sales to fraud and abuse. We will also discuss how Kumo’s graph engine scales to graphs with over 50B entities. We will also talk about how we have built Kumo to be compliant from a security and privacy standpoint as well as provide sufficient explainability for a business to have confidence in the model. The team at Kumo also leads two open source initiatives to further research in this area- PyG.org is an open source graph learning framework used by researchers worldwide. Additionally Kumo, has worked with several academic partners to develop RelBench, a set of benchmark datasets – from discussions on Stack Exchange to book reviews on the Amazon Product Catalog. RelBench also has an implementation of Relational Deep Learning.
Bio: Hema Raghavan is Vice President of Engineering and Co-founder of Kumo AI where she is responsible for developing the AI technology to help Kumo users build better ML models. Previously, Raghavan was Senior Director of Engineering at LinkedIn where she led a globally distributed diverse team that built AI and ML solutions for fueling LinkedIn’s growth, including People You May Know and the company’s Air Traffic Controller AI that governed member communications. She has also worked as a Research Staff Member at IBM and a Scientist at Yahoo!. Raghavan has a PhD in Computer Science from the University of Massachusetts Amherst, and a degree in Computer Engineering from the University of Mumbai.
Dragos Margineantu
Wednesday, August 28th, 2:00 PM – 4:00 PM
Bio: Dragos Margineantu is a Boeing Senior Technical Fellow and Artificial Intelligence (AI) Chief Technologist who is the technical lead of AI research and engineering in Boeing.
His interests include computational methods for robust systems, autonomous commercial flight, anomaly and surprise detection & handling, reasoning under uncertainty, validation and testing of decision systems, cost-sensitive, active, ensemble learning, and inverse reinforcement learning.
Dragos was one of the pioneers in research on ensemble learning and cost-sensitive learning and on statistical testing of learned models.
At Boeing, he developed machine learning based solutions for autonomous flight, manufacturing, airplane maintenance, airplane performance, surveillance, and security.
Dragos Margineantu was the program chair of the KDD 2015 Applied Data Science track, served as the Boeing principal investigator (PI) of multiple DARPA research projects and is the Action Editor for Special Issues for the Machine Learning journal.
He co-advised graduate students at Massachusetts Institute of Technology (MIT) and KU Leuven in Belgium, served on Canada Research Chair committees, and on NSF review panels. Together with Mohamed Zaki and Sanjay Chawla, he started and co-chaired the Machine Learning Data Analytics Symposia (MLDAS) series since 2014.
In his free time Dragos is coaching middle schoolers for mathematics competitions and enjoys nature photography.
Dragos Margineantu earned a Ph.D. in Computer Science/Machine Learning from Oregon State University in 2001.