INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -week
    -0.07
     slowed
    -0.07
     harmed
    -0.07
     frü
    -0.07
     democratic
    -0.06
     ramps
    -0.06
     thải
    -0.06
     avoids
    -0.06
     ramp
    -0.06
     unsure
    -0.06
    POSITIVE LOGITS
     possession
    0.11
    .Ass
    0.09
     possess
    0.07
     possessing
    0.07
    _SECTION
    0.07
    ουσ
    0.07
    ์↵↵
    0.07
     possessions
    0.07
    0.07
     Lever
    0.07
    Act Density 0.009%

    No Known Activations