INDEX
    Explanations

    references to political norms and behaviors

    New Auto-Interp
    Negative Logits
     idéia
    -0.72
     basée
    -0.71
     dégust
    -0.61
    tablir
    -0.60
     pierdas
    -0.59
     basé
    -0.59
     Erişim
    -0.59
     AssemblyVersion
    -0.59
     bacio
    -0.58
     engraçadas
    -0.57
    POSITIVE LOGITS
     animating
    0.79
    ിച്ച
    0.71
    <_>
    0.63
     Leviathan
    0.60
     plau
    0.58
    coher
    0.58
     elites
    0.57
     sclero
    0.57
     tolerably
    0.56
     ―――――
    0.56
    Act Density 0.889%

    No Known Activations