INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     decades
    -0.08
     prophets
    -0.07
     Examination
    -0.07
     parece
    -0.07
     bereits
    -0.07
    Floor
    -0.06
     thereby
    -0.06
    NOW
    -0.06
    proved
    -0.06
    ov
    -0.06
    POSITIVE LOGITS
     Single
    0.10
    Single
    0.09
     single
    0.08
    single
    0.08
    /single
    0.08
     싱글
    0.07
    ingle
    0.07
    0.07
    0.07
    -single
    0.07
    Act Density 0.010%

    No Known Activations