INDEX
    Explanations

    activations related to political terms

    occurrences of the letter "P"

    New Auto-Interp
    Negative Logits
     adm
    -0.76
     susp
    -0.69
    ģĸ
    -0.69
     diplom
    -0.68
     Aval
    -0.67
     Pry
    -0.67
     phyl
    -0.66
     wip
    -0.65
     differe
    -0.65
     transports
    -0.64
    POSITIVE LOGITS
    redict
    1.39
    ossible
    1.35
    ossession
    1.28
    aired
    1.25
    ractical
    1.21
    ossibly
    1.20
    odcast
    1.20
    overty
    1.19
    ierce
    1.18
    ardon
    1.18
    Act Density 0.056%

    No Known Activations