INDEX
Explanations
punctuation and formatting markers in the text
New Auto-Interp
Negative Logits
oger
-0.15
kat
-0.15
curves
-0.14
auss
-0.14
онÑĮ
-0.14
communities
-0.14
curve
-0.14
bottle
-0.14
ography
-0.14
ο
-0.13
POSITIVE LOGITS
æĪ
0.15
ARIO
0.15
JNI
0.15
charm
0.15
éϵ
0.14
heimer
0.14
otoxic
0.14
.tie
0.14
درÛĮ
0.14
ãģ¨ãģĨ
0.14
Activations Density 0.023%