INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     váž
    -0.06
    -0.06
     ideology
    -0.06
     lunches
    -0.06
    lor
    -0.06
     Τα
    -0.06
    /colors
    -0.06
    ifa
    -0.06
    cca
    -0.06
    artifact
    -0.06
    POSITIVE LOGITS
     witty
    0.07
    (ref
    0.07
    worked
    0.07
     Voters
    0.06
     DIRECT
    0.06
    _DECLARE
    0.06
    ücken
    0.06
     getattr
    0.06
    _n
    0.06
    ;br
    0.06
    Act Density 0.007%

    No Known Activations