INDEX
    Explanations

    pronouns and references to individuals

    New Auto-Interp
    Negative Logits
    Personensuche
    -0.74
    -0.56
     realisation
    -0.54
     visualisation
    -0.52
     grading
    -0.51
     snowing
    -0.51
     femen
    -0.51
    WithTag
    -0.50
    Thrown
    -0.50
    IntoConstraints
    -0.50
    POSITIVE LOGITS
     was
    1.08
     can
    1.07
     is
    1.04
     had
    1.04
     will
    1.02
     would
    1.01
     has
    0.99
     could
    0.94
     actually
    0.93
     also
    0.91
    Act Density 1.050%

    No Known Activations