INDEX
    Explanations

    conversational phrases or expressions involving personal pronouns and contractions

    New Auto-Interp
    Negative Logits
     def
    -0.57
     main
    -0.56
     sim
    -0.56
     base
    -0.56
     file
    -0.55
     ver
    -0.55
     gra
    -0.54
     franchise
    -0.54
     bes
    -0.54
     del
    -0.53
    POSITIVE LOGITS
     berdua
    0.46
     ślub
    0.40
     Económica
    0.39
     Erfindung
    0.39
    ających
    0.38
    AndEndTag
    0.38
     Absicht
    0.37
     wystarczy
    0.37
     chcą
    0.37
     triliun
    0.37
    Act Density 0.091%

    No Known Activations