INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Others
    -0.08
     others
    -0.08
     other
    -0.08
     enough
    -0.07
     and
    -0.07
    ali
    -0.07
     Other
    -0.07
     outras
    -0.07
    Others
    -0.06
     altri
    -0.06
    POSITIVE LOGITS
    thing
    0.10
     thing
    0.09
    THING
    0.09
    бÑĥдÑĮ
    0.09
    onec
    0.08
     nÃło
    0.08
    InstanceOf
    0.08
    ething
    0.08
    kind
    0.07
    :Any
    0.07
    Act Density 0.009%

    No Known Activations