A model of language learning

Agent-based modelling, Konstanz, 2024

Henri Kauhanen

23 April 2024

Plan

  • Starting this week, we will put programming to good use
  • We’ll start with a simple model of language learning
    • Here, learning = process of updating a linguistic representation
    • Doesn’t matter whether child or adult

Grammar competition

  • Assume two grammars \(G_1\) and \(G_2\) that generate languages \(L_1\) and \(L_2\)
    • language = set of strings (e.g. sentences)
  • In general, \(L_1\) and \(L_2\) will be different but may overlap:

Grammar competition

  • Three sets of interest: \(L_1 \setminus L_2\), \(L_1 \cap L_2\) and \(L_2 \setminus L_1\)

Grammar competition

  • Three sets of interest: \(L_1 \setminus L_2\), \(L_1 \cap L_2\) and \(L_2 \setminus L_1\)

Grammar competition

  • Three sets of interest: \(L_1 \setminus L_2\), \(L_1 \cap L_2\) and \(L_2 \setminus L_1\)

Concrete example

  • SVO (\(G_1\)) vs. V2 (\(G_2\))

Grammar competition

  • Suppose learner receives randomly chosen strings from \(L_1\) and \(L_2\)
  • Learner uses either \(G_1\) or \(G_2\) to parse incoming string
  • Define \(p =\) probability of use of \(G_1\)
  • How should the learner update \(p\) in response to interactions with his/her environment?

Variational learning

  • Suppose learner receives string/sentence \(s\)
  • Then update is:
Learner’s grammar String received Update
\(G_1\) \(s \in L_1\) increase \(p\)
\(G_1\) \(s \in L_2 \setminus L_1\) decrease \(p\)
\(G_2\) \(s \in L_2\) decrease \(p\)
\(G_2\) \(s \in L_1 \setminus L_2\) increase \(p\)

Exercise

How can we increase/decrease \(p\) in practice? What is the update formula?

Exercise

Answer

One possibility (which we will stick to):

  • Increase: \(p\) becomes \(p + \gamma (1 - p)\)
  • Decrease: \(p\) becomes \(p - \gamma p\)

The parameter \(0 < \gamma < 1\) is a learning rate

Why this form of update formula?

  • Need to make sure that always \(0 \leq p \leq 1\) (it is a probability)
  • Also notice:
    • When \(p\) is increased, what is added is \(\gamma (1-p)\). Since \(1-p\) is the probability of \(G_2\), this means transferring an amount of the probability mass of \(G_2\) onto \(G_1\).
    • When \(p\) is decreased, what is removed is \(\gamma p\). Since \(p\) is the probability of \(G_1\), this means transferring an amount of the probability mass of \(G_1\) onto \(G_2\).
    • Learning rate \(\gamma\) determines how much probability mass is transferred.

Plan

  • To implement a variational learner computationally, we need:
    1. A representation of a learner who embodies a single probability, \(p\), and a learning rate, \(\gamma\)
    2. A way to sample strings from \(L_1 \setminus L_2\) and from \(L_2 \setminus L_1\)
    3. A function that updates the learner’s \(p\)
  • Let’s attempt this now!

The struct

  • The first point is very easy:
mutable struct VariationalLearner
  p::Float64
  gamma::Float64
end

Sampling strings

  • For the second point, note we have three types of strings which occur with three corresponding probabilities
  • Let’s refer to the string types as "S1", "S12" and "S2", and to the probabilities as P1, P12 and P2:
String type Probability Explanation
"S1" P1 \(s \in L_1 \setminus L_2\)
"S12" P12 \(s \in L_1 \cap L_2\)
"S2" P2 \(s \in L_2 \setminus L_1\)

Sampling strings

  • In Julia, sampling from a finite number of options (here, three string types) with corresponding probabilities is handled by a function called sample() which lives in the StatsBase package
  • First, install and load the package:
using Pkg
Pkg.add("StatsBase")
using StatsBase

Sampling strings

Now to sample a string, you can do the following:

# the three probabilities (just some numbers I invented)
P1 = 0.4
P12 = 0.5
P2 = 0.1

# sample one string
sample(["S1", "S12", "S2"], Weights([P1, P12, P2]))
"S12"

Tidying up

  • The above works but is a bit cumbersome – for example, every time you want to sample a string, you need to refer to the three probabilities
  • Let’s carry out a bit of software engineering to make this nicer to use
  • First, we encapsulate the probabilities in a struct of their own:
struct LearningEnvironment
  P1::Float64
  P12::Float64
  P2::Float64
end

Tidying up

  • We then define the following function:
function sample_string(x::LearningEnvironment)
  sample(["S1", "S12", "S2"], Weights([x.P1, x.P12, x.P2]))
end
sample_string (generic function with 1 method)
  • Test the function:
paris = LearningEnvironment(0.4, 0.5, 0.1)
sample_string(paris)
"S12"

Implementing learning

  • We now need to tackle point 3, the learning function which updates the learner’s state
  • This needs to do three things:
    1. Sample a string from the learning environment
    2. Pick a grammar to try and parse the string with
    3. Update \(p\) in response to whether parsing was successful or not

Exercise

How would you implement point 2, i.e. picking a grammar to try and parse the incoming string?

Exercise

Answer

We can again use the sample() function from StatsBase, and define:

function pick_grammar(x::VariationalLearner)
  sample(["G1", "G2"], Weights([x.p, 1 - x.p]))
end
pick_grammar (generic function with 1 method)

Implementing learning

  • Now it is easy to implement the first two points of the learning function:
function learn!(x::VariationalLearner, y::LearningEnvironment)
  s = sample_string(y)
  g = pick_grammar(x)
end
learn! (generic function with 1 method)
  • How to implement the last point, i.e. updating \(p\)?

Aside: conditional statements

  • Here, we will be helped by conditionals:
if COND1
  # this is executed if COND1 is true
elseif COND2
  # this is executed if COND1 is false but COND2 is true
else
  # this is executed otherwise
end
  • Note: only the if block is necessary; elseif and else are optional, and there may be more than one elseif block

Aside: conditional statements

  • Try this for different values of number:
number = 1

if number > 0
  println("Your number is positive!")
elseif number < 0
  println("Your number is negative!")
else
  println("Your number is zero!")
end

Comparison \(\neq\) assignment

Important

To compare equality of two values inside a condition, you must use a double equals sign, ==. This is because the single equals sign, =, is already reserved for assigning values to variables.

if 0 = 1    # throws an error!
  println("The world is topsy-turvy")
end

if 0 == 1   # works as expected
  println("The world is topsy-turvy")
end

Exercise

  • Use an if ... elseif ... else ... end block to finish off our learn! function
  • Tip: logical “and” is &&, logical “or” is ||
  • Recall:
Learner’s grammar String received Update
\(G_1\) \(s \in L_1\) increase \(p\)
\(G_1\) \(s \in L_2 \setminus L_1\) decrease \(p\)
\(G_2\) \(s \in L_2\) decrease \(p\)
\(G_2\) \(s \in L_1 \setminus L_2\) increase \(p\)

Exercise

Answer

Important! The following function, which we originally used, has a bug! It does not update the learner’s state with input strings from \(L_1 \cap L_2\). See below for fixed version.

function learn!(x::VariationalLearner, y::LearningEnvironment)
  s = sample_string(y)
  g = pick_grammar(x)

  if g == "G1" && s == "S1"
    x.p = x.p + x.gamma * (1 - x.p)
  elseif g == "G1" && s == "S2"
    x.p = x.p - x.gamma * x.p
  elseif g == "G2" && s == "S2"
    x.p = x.p - x.gamma * x.p
  elseif g == "G2" && s == "S1"
    x.p = x.p + x.gamma * (1 - x.p)
  end

  return x.p
end

Exercise

Answer

function learn!(x::VariationalLearner, y::LearningEnvironment)
  s = sample_string(y)
  g = pick_grammar(x)

  if g == "G1" && s != "S2"
    x.p = x.p + x.gamma * (1 - x.p)
  elseif g == "G1" && s == "S2"
    x.p = x.p - x.gamma * x.p
  elseif g == "G2" && s != "S1"
    x.p = x.p - x.gamma * x.p
  elseif g == "G2" && s == "S1"
    x.p = x.p + x.gamma * (1 - x.p)
  end

  return x.p
end
learn! (generic function with 1 method)

Testing our code

  • Let’s test our code!
bob = VariationalLearner(0.5, 0.01)
paris = LearningEnvironment(0.4, 0.5, 0.1)

learn!(bob, paris)
learn!(bob, paris)
learn!(bob, paris)
learn!(bob, paris)
learn!(bob, paris)
0.51489901495

Testing our code

trajectory = [learn!(bob, paris) for t in 1:1000]
1000-element Vector{Float64}:
 0.5097500248005
 0.514652524552495
 0.5195059993069701
 0.5143109393139004
 0.5191678299207614
 0.5239761516215538
 0.5287363901053382
 0.5334490262042848
 0.5381145359422419
 0.5327333905828194
 0.5374060566769913
 0.5420319961102213
 0.5466116761491191
 ⋮
 0.8043883364948524
 0.7963444531299039
 0.7983810085986048
 0.7903971985126188
 0.7924932265274927
 0.7945682942622178
 0.7966226113195956
 0.7986563852063996
 0.7906698213543356
 0.7927631231407922
 0.7948354919093843
 0.7968871369902905

Plotting the learning trajectory

using Plots
plot(1:1000, trajectory)

Bibliographical remarks

  • For more about the notion of grammar competition, see Kroch (1989), Kroch (1994)
  • Variational learner originally from Yang (2000), Yang (2002)
  • Learning algorithm itself is old: Bush and Mosteller (1955)

Summary

  • You’ve learned a few important concepts today:
    • Grammar competition and variational learning
    • How to sample objects according to a discrete probability distribution
    • How to use conditional statements
    • How to make a simple plot of a learning trajectory
  • You get to practice these in the homework
  • Next week, we’ll take the model to a new level and consider what happens when several variational learners interact

References

Bush, Robert R., and Frederick Mosteller. 1955. Stochastic Models for Learning. New York, NY: Wiley.
Kroch, Anthony S. 1989. “Reflexes of Grammar in Patterns of Language Change.” Language Variation and Change 1 (3): 199–244. https://doi.org/10.1017/S0954394500000168.
———. 1994. “Morphosyntactic Variation.” In Proceedings of the 30th Annual Meeting of the Chicago Linguistic Society, edited by K. Beals, 180–201. Chicago, IL: Chicago Linguistic Society.
Yang, Charles D. 2000. “Internal and External Forces in Language Change.” Language Variation and Change 12: 231–50. https://doi.org/10.1017/S0954394500123014.
———. 2002. Knowledge and Learning in Natural Language. Oxford: Oxford University Press.