INDEX
Explanations
words in a foreign language, potentially indicating a specific language or pattern in the text
New Auto-Interp
Negative Logits
Franch
-0.80
panels
-0.75
charm
-0.71
theless
-0.68
inev
-0.65
comparisons
-0.64
admission
-0.64
concede
-0.63
responsibility
-0.62
cooler
-0.62
POSITIVE LOGITS
º
1.48
¾
1.47
²
1.47
´
1.37
¸
1.35
¢
1.31
¼
1.30
©¶æ¥µ
1.30
½
1.30
Ĩ
1.30
Activations Density 0.015%