INDEX
    Explanations

    pronouns saying or thinking

    New Auto-Interp
    Negative Logits
     lika
    -1.19
    Cinta
    -1.13
    Moda
    -1.12
    bilden
    -1.12
     klaus
    -1.07
     plafon
    -1.07
     rah
    -1.07
    -1.06
    Aktu
    -1.06
     tarif
    -1.06
    POSITIVE LOGITS
     said
    1.62
     says
    1.55
     say
    1.16
     should
    1.13
     noted
    1.07
    And
    1.00
    he
    0.97
    に来て
    0.96
    Note
    0.94
     spune
    0.92
    Act Density 0.011%

    No Known Activations