INDEX
    Explanations

    mentions of a specific place name

    references to the word "Glad."

    New Auto-Interp
    Negative Logits
     Mutant
    -0.71
     Moore
    -0.70
     temper
    -0.68
     session
    -0.66
     Nero
    -0.63
     Hawkins
    -0.63
     practice
    -0.63
     square
    -0.62
     Apocalypse
    -0.61
     imp
    -0.61
    POSITIVE LOGITS
    lad
    4.64
     Lad
    1.54
     Glad
    1.30
    lav
    1.17
    adder
    1.05
    lass
    1.02
    laden
    1.02
    lol
    1.02
    lam
    1.01
    los
    1.01
    Act Density 0.010%

    No Known Activations