INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     anspruch
    -0.09
     messing
    -0.08
     manje
    -0.08
     longo
    -0.08
    -0.08
     PPO
    -0.07
    ครบ
    -0.07
     данный
    -0.07
     bouncing
    -0.07
    Centro
    -0.07
    POSITIVE LOGITS
     hush
    0.11
     whispers
    0.08
    0.08
     درباره
    0.08
     silencio
    0.08
    itory
    0.08
     turns
    0.08
    ify
    0.07
     hearings
    0.07
     Dawn
    0.07
    Act Density 0.002%

    No Known Activations