INDEX
    Explanations

    Dialogue and filler words

    New Auto-Interp
    Negative Logits
    icul
    -0.08
    alyzed
    -0.08
    aly
    -0.07
    Directed
    -0.07
    ifferent
    -0.07
     inactivity
    -0.07
    елик
    -0.07
     συμβ
    -0.07
    iculos
    -0.07
    ellisen
    -0.07
    POSITIVE LOGITS
    вр
    0.10
    كس
    0.09
     graag
    0.08
    0.08
     مج
    0.08
     кроме
    0.08
     liever
    0.08
    !',↵
    0.08
     ahubwo
    0.08
     غواړي
    0.08
    Act Density 0.022%

    No Known Activations