CMPUT 605 - Reading List (in progress)
-
a. A reading on
finding similar items, Chapter 3 from the book Mining of Massive Datasets, Leskovec, Rajaraman, and Ullman, 2014 (more emphasis on Sec, 3.3-3.7).
-
c. A reading on
neural nets and deep learning, Chapter 13 from the book Mining of Massive Datasets, Leskovec, Rajaraman, and Ullman, 2014.
- a. Yaghmazadeh, Navid, Yuepeng Wang, Isil Dillig, and Thomas Dillig. SQLizer: query synthesis from natural language, OOPSLA 2017.
- b. Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. RATSQL: Relation-aware schema encoding and linking for text-to-SQL parsers, ACL 2020. [Maliha]
- c. Li Done, Mirella Lapata. Language to logical form with neural attention. ACL, 2016.
- d. Li Dong, Mirella Lapata. Coarse-to-Fine Decoding for Neural Semantic Parsing. ACL 2018.
- e. Ruichu Cai, Boyan Xu, Zhenjie Zhang, Xiaoyan Yang, Zijian Li, Zhihao Liang. An Encoder-Decoder Framework Translating Natural Language to Database Queries. IJCAI 2018. [Mohammadreza]
- f. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task, EMNLP 2018. (dataset-not good for presentation)
- g. Zhong, Victor, Caiming Xiong, and Richard Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning, arXiv preprint arXiv:1709.00103 (2017). (dataset-not good for presentation)
- a. Hyeonji Kim, Byeong-Hoon So, Wook-Shin Han, Hongrae Lee. Natural language to SQL: Where are we today?, VLDB 2020. (overview/survey-not good for presentation)
- b. S. Chu, B. Murphy, J. Roesch, A. Cheung, and D. Suciu. Axiomatic foundations and algorithms for deciding semantic equivalences of SQL queries. PVLDB, 11(11):1482–1495, 2018.
- c. J. Castelein, M. F. Aniche, M. Soltani, A. Panichella, and A. van Deursen. Search-based test data generation for SQL queries. ICSE 2018. [Mohammadreza]
- d. S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. CIDR 2017. (demo-not good for presentation)
- e. Ruiqi Zhong, Tao Yu, Dan Klein. Semantic Evaluation for Text-to-SQL with Distilled Test Suites. arXiv:2010.02840. 2020. [Maliha]
- a. Suadaa, Lya Hulliyyatus, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura, and Hiroya Takamura. Towards table-to-text generation with numerical reasoning, ACL 2021. [Mohammadreza]
- b. Puduppully, Ratish, Li Dong, and Mirella Lapata. Data-to-text generation with entity modeling, ACL 2019.
- c. Chen, Wenhu, Jianshu Chen, Yu Su, Zhiyu Chen, and William Yang Wang. Logical natural language generation from open-domain tables, ACL 2020. [Maliha]
- d. Parikh, Ankur P., Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipanjan Das. Totto: A controlled table-to-text generation dataset, ACL 2020. (dataset-not good for presentation)
- e. Nan, Linyong, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang et al. DART: Open-Domain Structured Data Record to Text Generation, NAACL 2021. (dataset-not good for presentation)
- f. Nan, Linyong, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński et al. FeTaQA: Free-form Table Question Answering, arXiv preprint arXiv:2104.00369 (2021). (dataset-not good for presentation)
- a. F. Nargesian, E. Zhu, Ken Pu, R.J. Miller:
Table union search on open data, VLDB 2018.
- b. E. Zhu, F. Nargesian, K.Q. Pu, and R.J. Miller:
LSH ensemble: Internet-scale domain search,
VLDB 2016.
- c. A. Alserafi, A. Abello, O. Romero, T. Calders. Keeping the Data Lake in Form: Proximity Mining for Pre-Filtering Schema Matching, TIS 2020.
- d. O. Lehmberg, C. Bizer:
Stitching web tables for improving matching quality,
VLDB 2017.
- e. A. Dargahi, D. Rafiei:
Efficiently Transforming tables for joinability. CoRR abs/2111.09912 (2021).
- f. A. Hasnat, D. Rafiei:
Interactive set discovery. CoRR abs/2111.09917 (2021).
- a. Y.Y. Weiss, S. Cohen. Reverse engineering spj-queries from examples, PODS 2017.
- b. D.V. Kalashnikov, L. Lakshmanan, D. Srivastava.
FastQRE: Fast query reverse engineering, SIGMOD 2018.
- c. F. Psallidas,B. Ding, K. Chakrabarti, S. Chaudhuri. S4: Top-k spreadsheet-style search for query discovery, SIGMOD 2015.
- d. A. Bonifati, R. Ciucanu, S. Staworko. Interactive Inference of Join Queries, EDBT 2014: 451-462.
- a. Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19, no. 1 (2017): 22-36. (overview-not godo for presentation)
- b. Kai Shu, Suhang Wang, and Huan Liu. Beyond news contents: The role of social context for fake news detection. WSDM 2019.
- c. Linmei Ui, Tianchi Yang, Luhao Zhang, Wanjun Zhong, Duyu Tang, Chuan Shi, Nan Duan, and Ming Zhou. Compare to The Knowledge: Graph Neural Fake News Detection with External Knowledge. ACL 2021.
- a. I.I. Ceylan, A. Darwiche, G. Van den Broeck. Open-world probabilistic databases: Semantics, algorithms, complexity, Artificial Intelligence 295 (2021): 103474. (an earlier/shorter version in Description Logics 2016).
- b. L. Orr, S. Ainsworth, W. Cai, K. Jamieson, M. Balazinska, D. Suciu. Mosaic: a sample-based database system for open world query processing. arXiv preprint arXiv:1912.07777 (2019).
- c. E. Ruckhaus, E. Ruiz, M-E. Vidal. Query evaluation and optimization in the semantic web. Theory and Practice of Logic Programming 8, no. 3 (2008): 393-409.
- d. L. Orr, M. Balazinska, D. Suciu. Sample debiasing in the themis open world database system. SIGMOD 2020.
Items marked as ** are given as additional references and are not for weekly discussions.