John Philip McCrae

I am an established researcher with 8 years post-PhD experience, and deputy head of the NLP group at the Insight Centre, the largest centre for data science in Ireland. My work has focussed on the intersection of NLP and data science, and I have lead the development of the linguistic linked open data cloud, a large-scale integration of many language resources. I currently supervise five students, two of those students are co-supervised with Dr. Paul Buitelaar, the head of the NLP group who works on semantic technologies for natural language processing. One student is supervised in collaboration with Dr. Clodagh Downey at the Irish Dept. at NUI Galway, who researches Old and Middle Irish language and literature. I supervise two students as primary supervisor, supported by Dr. Mihael Arcan, a postdoctoral researcher in the NLP group. I completed my PhD within 3 years while still publishing a journal article (with 47 citations) and contributing to the BioCaster system for detecting disease outbreaks by processing texts in East Asian languages. After joining Bielefeld University in 2009, I played a leading role in at least two major scientific breakthroughs. Firstly, the development of the lemon Lexicon Model for Ontologies was a major contribution to the representation of semantics relative to natural language and is now being used by most relevant research groups and was one of the most significant outcomes of the Monnet project, an FP7 funded project. Secondly, out of the work on this topic I have been instrumental in creating the topic of linguistic linked open data as a major research theme which has been supported by over a dozen workshops and events and was a major theme of the 2016 Language Resource and Evaluation Conference (LREC). This topic lead to the Lider project, which used linguistic linked open data as an enabler for content analytics in enterprise and was funded by FP7, where I played a major role in writing the grant and in implementing the work plan. More recently, my work in linked data has played a pivotal role in obtaining funding for the ELEXIS project (under H2020-INFRAIA), where we will apply linked data technologies to lexicography. My work has lead to 73 publications, and on Google Scholar I have 1,310 total citations and an h-Index of 19, nearly all of these citations are for work that did not involve my PhD supervisor and I have co-authored with 104 co-authors from institutions around the world. This grant will help me to further consolidate my team within the Insight Centre and to continue my work on bridging the research areas of natural language processing and data science.




  • Chair, 1st Conference on Language, Data and Knowledge (LDK 2017)-
  • Director, 1st and 2nd Summer Datathon on Linguistic Linked Open Data (SD-LLOD-15,SD-LLOD-17)
  • Chair of 7 workshops (including 5 instances of the Linked Data in Linguistics Series) and 5 tutorials
  • Guest lecturer at EUROLAN 2015 (“Ontology-Lexica with Lemon”) and ESSLLI 2018. (“Introduction to Linked Open Data in Linguistics”)
  • Chair of W3C Community Group on Best Practices for Multilingual Linked Open Data (BPMLOD).
  • Member of program committee for over 60 major conferences and workshops, including ACL, EMNLP, COLING, LREC, ISWC, ESWC.
  • Reviewer for 12 journals including Journal of Artificial Intelligence Research, Artificial Intelligence (Elsevier), Natural Language Engineering and Semantic Web Journal
  • Guest editor of special issue of MDPI Information on “Towards the Multilingual Web of Data”
  • Guest editor of special issue of the Semantic Web Journal on “Multilingual Linked Data”


  • Natural Language Processing (MSc Course, NUI Galway) - Lecturer 15/16, 16/17, 17/18
  • Statistical Natural Language Processing (MSc Course, Bielefeld University) - Lecturer 13/14, 14/15
  • Introduction to the Semantic Web (MSc Course, Bielefeld University) - Tutor 10/11

Research visits

  • Princeton University, November-December 2013
  • Technical University of Delft, September 2010