Syntax

John P. McCrae - University of Galway

Course at ESSLLI 2025

Part-of-speech analysis

Part-of-speech tags

TagDescriptionExample
ADJ adjectiveyellow, big, international
ADP adpositionwith, in, at
ADV adverbquickly, yesterday, tomorrow
AUX auxiliaryis, has (done), will (do)
CCONJ coordinating conjunctionand, or, but
DET determinera, an, the
INTJ interjectionpsst, ouch, bravo, hello
NOUN nouncat, tree, air, beauty
NUM numeral1, 2017, one, seventy-seven, IV, MMXIV
PART particle's, not
PRON pronounI, you, he, she
PROPN proper nounMary, John, London, NATO, HBO
PUNCT punctuation., (, ), ?
SCONJ subordinating conjunctionif, while, that
SYM symbol$, %, §, ©, 😝
VERB verbwork, type, run, speak
X othersfpksdpsxmsa

Part-of-speech tagging

Assign a part-of-speech tag to each token

TokenPOS
IPRON
amAUX
aDET
linguistNOUN

spaCy

For our purposes, we'll use the spaCy library

Example: English genitives

  • We can now look at the 's genitive
  • Is this a clitic?
  • Does the use of more genitives indicate a more informal text?

Parsing

Parsing

Understand the syntactic structure of a sentence by means of the relationships between words

The girl, who is playing with cars, likes toast

Parsing as rewriting

The girl,who is playing with cars, likes toast

Parsing as rewriting

The girl,who is playing with cars, likes toast

Parsing as rewriting

DET NOUNSCONJ AUX VERB ADP NOUN VERB NOUN

Parsing as rewriting

DET NOUNSCONJ AUX VERB ADP NOUN VERB NOUN

Parsing as rewriting

NPRELVP

Parsing as rewriting

NPVP

Parsing as rewriting

S

Phrase grammar

Dependency grammar

Syntactic ambiguity

Ambiguity

https://www.menti.com/nmv7zbqr9s

Example: Placement of adverbs

  • There is not such thing as an adverb!
  • Adverbs are a catch-all category covering several different usages
  • Many have claimed that adverbs are special version of adjectives

Example: Placement of adverbs

  • We will examine if ly adverbs have a different usage pattern than other adverbs
  • We will define this by what they modify

Language Usage

Chunking

  • Parsing allows us to group words together as phrases
  • We can also use simple (regular expression) patterns to extract chunks
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar)
cp.parse(brown.tagged_sents()[0])
Colab Link

Named Entities

  • Named entities are phrases that refer to specific entities
  • Further specialised by type (person, location, organisation, etc.)
  • We can extract these with spaCy
doc = nlp("Apple is looking at buying U.K. startup")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Multiword Expressions

  • Multiword expressions are phrases that have a special meaning
  • A large annotated corpora of MWEs is the PARSEME corpus
  • PARSEME corpus

Concordances

We can use the NLTK concordance function to find examples of a word in context

corpus = gutenberg.words('melville-moby_dick.txt')
text = Text(corpus)
text.concordance("monstrous")

Word Sketches

  • A one-page, automatic, corpus-derived summary of a word
  • Used by Sketch Engine in lexicography

Example: Diachronic change

Summary

Summary

  • Corpus selection is key to answering linguistic research questions
  • Modern NLP can analyse words, syntax, dependencies and language usage
  • Concordance can reveal interesting patterns of language usage
Back