ICDM 2011 Invited Speakers
Dr. Cynthia Dwork, Distinguished Scientist, Microsoft ResearchTitle: The Promise of Differential Privacy [click to download]
Abstract: "Differential privacy" describes a promise, made by a data curator to a data subject: you will not be affected, adversely or otherwise, by allowing your data to be used in any study, no matter what other studies, data sets, or information from other sources is available. At their best, differentially private mechanisms can make confidential data widely available for accurate datamining, without resorting to data clean rooms, institutional review boards, restricted views, or data protection plans. Nonetheless, a fundamental limit exists: overly accurate answers to too many questions will destroy privacy. Differentially private access provides a measure of cumulative privacy risk, permitting data access to be interdicted before privacy is destroyed. The goal of algorithmic research on differential privacy is to postpone this inevitability as long as possible. This talk surveys the current state of the art.
Cynthia Dwork, Distinguished Scientist at Microsoft Research, is the world's foremost expert on placing privacy-preserving data analysis on a mathematically rigorous foundation. A cornerstone of this work is differential privacy, a strong privacy guarantee permitting highly accurate data analysis. Dr. Dwork has also made seminal contributions in cryptography and distributed computing, and is a recipient of the Edsger W. Dijkstra Prize, recognizing some of her earliest work establishing the pillars on which every fault-tolerant system has been built for decades. She is a member of the US National Academy of Engineering and a Fellow of the American Academy of Arts and Sciences.
Dr. C. Lee GilesTitle: Data Mining and Information Extraction for CiteSeerX and Friends
Abstract: Cyberinfrastructure or e-science has become crucial in many areas of science where data access often defines scientific progress. Open source (OS) systems have greatly facilitated design and implementation and supporting cyberinfrastructure permitting the design of specialized integrated search engines and digital libraries which offer many opportunities for domain relevant information and knowledge extraction, such as citation extraction, automated indexing and ranking, chemical formulae search, table indexing, etc. We describe the open source SeerSuite architecture which is a modular, extensible system built on successful OS projects such as Lucene/Solr and discuss issues in building domain specific enterprise search and cyberinfrastructure for the sciences and academia. Because of the large amount of information crawled and/or search there are many scale problems in information extraction and data mining such as author and entity disambiguation, data extraction and ranking, etc. We highlight application domains with examples from computer science, CiteSeerX, and chemistry, ChemXSeer and related problem areas.
Because such enterprise systems require unique information extraction approaches, several different machine learning methods, such as conditional random fields, support vector machines, mutual information based feature selection, sequence mining, etc. are critical for performance. We draw lessons for other e-science and cyberinfrastructure systems in terms of design, implementation and research and discuss future directions, systems and research.
Dr. C. Lee Giles holds the David Reese Professorship at the Pennsylvania State University, University Park, PA, with appointments in the College of Information Sciences and Technology, Computer Science and Engineering, and Supply Chain and Information Systems. He is or has been on the following program committees: KDD, SIGIR, WWW, JCDL, ICML, NIPS, AAAI. He is a Fellow of the ACM, IEEE and INNS. He is probably best known for his work on estimating the size of the web and with the search engine and digital library, CiteSeer, which he cocreated, developed and maintained. He has published over 300 refereed articles.
Dr. Renée J. MillerTitle: On Schema Discovery [click to download]
Abstract: Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. Schemas enable data independence. In this talk, I consider new challenges in the old problem of schema discovery. I'll discuss the changing role of schemas from prescriptive to descriptive. I'll use examples from Web data publishing and from Business Analytics to motivate the automation of schema discovery and maintenance.
Renée J. Miller received BS degrees in Mathematics and in Cognitive Science from the Massachusetts Institute of Technology. She received her MS and PhD degrees in Computer Science from the University of Wisconsin in Madison, WI. She received the Presidential Early Career Award for Scientists and Engineers (PECASE) , the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received the National Science Foundation Early Career Award. She is a Fellow of the ACM, the President of the VLDB Endowment, and the Program Chair for ACM SIGMOD 2011 in Athens, Greece. Her research interests are in the efficient, effective use of large volumes of complex, heterogeneous data. This interest spans data integration, data exchange, knowledge curation and data cleaning. She is a Professor and the Bell Canada Chair of Information Systems at the University of Toronto. In 2011, she was elected to the Fellowship of the Royal Society of Canada (FRSC), Canada's National Academy.