INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    !
    0.93
    .
    0.93
     or
    0.89
     in
    0.85
     rather
    0.80
     might
    0.79
     could
    0.76
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.76
    παϊ
    0.75
     <
    0.74
    POSITIVE LOGITS
    tiempo
    1.12
     wichtig
    1.05
     Wsp
    0.99
     сумму
    0.98
     جیت
    0.97
    ستي
    0.96
    Ens
    0.96
     canzone
    0.96
     شدت
    0.95
     combien
    0.95
    Act Density 0.003%

    No Known Activations