INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     terribly
    -0.08
    			
    -0.07
     joker
    -0.07
     applies
    -0.07
    _once
    -0.07
     horribly
    -0.07
     boo
    -0.07
     göz
    -0.07
    ?,
    -0.07
     desires
    -0.07
    POSITIVE LOGITS
     Tape
    0.08
     Linden
    0.08
     lah
    0.08
     kura
    0.08
     Diskussion
    0.08
     enfoque
    0.08
    ãi
    0.08
    פשר
    0.08
     যেন
    0.07
    .orange
    0.07
    Act Density 0.051%

    No Known Activations