INDEX
    Explanations

    positive expressions of agreement or support

    New Auto-Interp
    Negative Logits
    vez
    -0.15
    eda
    -0.15
    ην
    -0.14
    foundland
    -0.14
    irling
    -0.14
    lay
    -0.14
    lor
    -0.14
     samot
    -0.13
    æħ
    -0.13
    rol
    -0.13
    POSITIVE LOGITS
    edly
    0.18
     запаÑģ
    0.17
    .compat
    0.17
    atively
    0.16
     same
    0.16
     Byl
    0.15
    eÄį
    0.15
     rằng
    0.14
    anced
    0.14
    .scalablytyped
    0.14
    Act Density 0.022%

    No Known Activations