INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Cum
    -0.07
    -0.06
    -X
    -0.06
     human
    -0.06
    -0.06
     Poll
    -0.06
     stretched
    -0.06
     Fn
    -0.06
    -high
    -0.06
     |--------------------------------------------------------------------------↵
    -0.06
    POSITIVE LOGITS
    тою
    0.07
     arte
    0.06
    ова
    0.06
     persona
    0.06
     activist
    0.06
    са
    0.06
     entrepreneur
    0.06
    .sky
    0.06
     simpl
    0.06
    _THROW
    0.06
    Act Density 0.036%

    No Known Activations