INDEX
    Explanations

    references to friendships and interpersonal relationships

    New Auto-Interp
    Negative Logits
     were
    -0.17
    were
    -0.17
    uzzer
    -0.17
     weren
    -0.16
     Were
    -0.15
     بÙĪØ¯ÙĨد
    -0.15
    Were
    -0.14
    ómo
    -0.14
    IDER
    -0.14
    (tol
    -0.14
    POSITIVE LOGITS
     ist
    0.40
     kommt
    0.38
     hat
    0.38
     wird
    0.37
     steht
    0.35
     stellt
    0.35
     lässt
    0.34
     bleibt
    0.34
     liegt
    0.33
     geht
    0.33
    Act Density 0.048%

    No Known Activations