Data Science & DataLab

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data.
DataLab at SKKU pursues data-driven research with the slogan - "Designing Science with Data". We have been applying data science technologies to data from various domains such as publications, healthcare, social media, and data in various forms including but not limited to relational data, text data, graph data, and electronic health records data.


Human-Computer Information Retrieval

Human-Computer Information Retrieval (HCIR) combines the fields of information retrieval (IR) and human-computer interaction (HCI) and creates systems that improve search by taking into account the human context.

Health Informatics

Health informatics is the application of information science and information technologies in the service of better health and better healthcare. We study, develop, and improve innovative information technologies in healthcare.

Science of Science

Science is an expanding and evolving network of ideas, scholars, and scholarly publications. Science of Science tries to use quantitative methods to understand the structure and dynamics of science as well as interactions among scientific entities.


vCard Image

Zhu, Yongjun

Director, Assistant Professor of Library & Information Science/Data Science

Research Assistants

vCard Image

Kim, Donghun

Research Assistant

vCard Image

Kim, Jaewon

Research Assistant

vCard Image

Jung, Woojin

Research Assistant

Nam, Seojin

Research Assistant

Graduate Students

vCard Image

Kang, Dongwon

Graduate Student

vCard Image

Lee, Kanghun

Graduate Student

vCard Image

Joo, Gayeon

Graduate Student


Journal Articles

  • Zhu, Y., Kim, M.C., & Yan, E (2018). Evaluating interactive bibliographic information retrieval systems: A user‐centered approach. Proceedings of the Association for Information Science and Technology, 55(1), 628-637.
  • Su, C., Tong, J., Zhu, Y., Cui, P., & Wang, F. (2018). Network embedding in biomedical data science. Briefings in Bioinformatics, bby117.
  • Kim, M.H., Banerjee, S., Zhao, Y., Wang, F., Zhang, Y., Zhu, Y., DeFerio, J., Evans, L., Park, S.M., & Pathak, J. (2018). Association networks in a matched case-control design – Co-occurrence patterns of preexisting chronic medical conditions in patients with major depression versus their matched controls. Journal of Biomedical Informatics, 87, 88-95.
  • Zhang, F., Yan, E., Niu, X., & Zhu, Y. (2018) Joint modeling of the association between NIH funding and its three primary outcomes: patents, publications, and citation impact. Scientometrics, 117(1), 591-602.
  • Zhu, Y., Kim, M., Banerjee, S., Deferio, J., Alexopoulos, G.S., & Pathak, J. (2018). Understanding the research landscape of major depressive disorder via literature mining: an entity-level analysis of PubMed data from 1948-2017. JAMIA OPEN, 1(1), 115–121
  • Zhu, Y., Olivier, E., Pathak, J., & Wang, F. (2018). Drug knowledge bases and their applications in biomedical informatics research. Briefings in Bioinformatics, bbx169.
  • Yan, E. & Zhu, Y. (2018). Tracking word semantic change in biomedical literature. International Journal of Medical Informatics, 109, 76-86.
  • Song, I.-Y. & Zhu, Y. (2017). Big Data and Data Science: Opportunities and Challenges of iSchools. Journal of Data and Information Science, 2(3), 1-18.
  • Zhu, Y., Yan, E., & Song, I.-Y. (2017). A natural language interface to a graph-based bibliographic information retrieval system. Data & Knowledge Engineering, 111, 73-89.
  • Zhu, Y., Yan, E., & Wang, F. (2017). Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Medical Informatics and Decision Making, 17(1), 95.
  • Zhu, Y. & Yan, E. (2017). Examining academic ranking and inequality in library and information science through faculty hiring networks. Journal of Informetrics, 11(2), 641-654.
  • Yan, E. & Zhu, Y. (2017). Adding the dimension of knowledge trading to source impact assessment: Approaches, indicators, and implications. Journal of the Association for Information Science & Technology, 68(5), 1090-1104.
  • Zhu, Y., Kim, M.C., & Chen, C. (2017). An investigation of the intellectual structure of opinion mining research. Information Research, 22(1), paper 739.
  • Zhu, Y. & Yan, E. (2016). Searching bibliographic data using graphs: A visual graph query interface. Journal of Informetrics, 10(4), 1092-1107.
  • Choi, N., Song, I.-Y., & Zhu, Y. (2016). A Model-based Method for Information Alignment: A Case Study on Educational Standards. Journal of Computing Science and Engineering, 10(3), 85-94.
  • Zhu, Y., Yan, E., & Song, M. (2016). Understanding the evolving academic landscape of library and information science through faculty hiring data. Scientometrics, 108(3), 1461-1478.
  • Zhu, Y., Song, M., & Yan, E. (2016). Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-based Approach. PLoS ONE, 11(5), e0156091.
  • Zhu, Y., Yan, E. & Song, I.-Y. (2016). The use of a graph-based system to improve bibliographic information retrieval: System design, implementation, and evaluation. Journal of the Association for Information Science & Technology, 68(2), 480-490.
  • Kim, M.C., Zhu, Y., & Chen, C. (2016). How are they different? A quantitative domain comparison of information visualization and data visualization (2000-2014). Scientometrics, 107(1), 123-165.
  • Song, I.-Y. & Zhu, Y. (2015). Big data and data science: what should we teach? Expert Systems, 33(4), 364-373.
  • Yan, E. & Zhu, Y. (2015). Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods. Journal of Informetrics, 9(3), 455–465.
  • Zhu, Y. & Yan, E. (2015). Dynamic subfield analysis of disciplines: An examination of the trading impact and knowledge diffusion patterns of computer science. Scientometrics, 104(1), 335-359.
  • Kim, H., Zhu, Y., Kim, W., & Sun, T. (2014). Dynamic faceted navigation in decision making using Semantic Web technology. Decision Support Systems, 61, 59-68.

Conference Papers

  • Zhu, Y., Kim, M.C., & Yan, E. (2018) Evaluating interactive bibliographic information retrieval systems: A user-centered approach. ASIS&T 2018, Vancouver, Canada.
  • Kim, M.H., Zhu, Y., Banerjee, S., Evans, L., Zhang, Y., Wang, F., Park, S.M., & Pathak, J. (2018) Comparing sex-specific association networks of chronic medical conditions. IEEE ICHI 2018. New York City, USA.
  • Yan, E. & Zhu, Y. (2017). Word semantic change: The law of differentiation vs. the law of parallel change. ISSI 2017. Wuhan, China.
  • Song, I.-Y., Zhu, Y., Ceong, H., & Thonggoom, O. (2015). Methodologies for Semi-automated Conceptual Data Modeling from Requirements. ER 2015. Stockholm, Sweden.
  • Zhu, Y., Yan, E., & Song, I.-Y. (2015). Topological Analysis of Interdisciplinary Scientific Journals: Which Journals Will be the Next Nature or Science? ACM RACS 2015. Prague, Czech Republic.
  • Kim, M. C., Feng, Y., Zhu, Y., & Ping, Q. (2015). Quantitative exploration into the diffusion process of creative ideas. ASIS&T 2015. Missouri, USA.
  • Zhu, Y., Jeon, D., Kim, W., Hong, J. S., Lee, M., Wen, Z., & Cai, Y. (2012). The Dynamic Generation of Refining Categories in Ontology-Based Search. JIST 2012. Nara, Japan.

Book Chapters

  • Kim, M.C. & Zhu, Y. (2018) Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies in Scientometrics. In Scientometrics. IntechOpen


Sungkyunkwan University

DSC3011: Applied Machine Learning
2019 Fall
LIS5055: Health Data Science
2019 Fall

DSC2004: Data Science and Python
2019 Spring
LIS5052: Programming Languages
2019 Spring

DSC3006: Practice in Machine Learning
2018 Fall
DSC3010: Practice in NoSQL Databases
2018 Fall

Cornell University

Health Data Mining
2017 Summer