INDEX
Explanations
punctuation and certain contractions or abbreviations in the text
New Auto-Interp
Negative Logits
igger
-0.16
erli
-0.15
hani
-0.15
.opens
-0.14
antar
-0.14
.hw
-0.14
lesi
-0.14
Redistributions
-0.14
egra
-0.14
dog
-0.14
POSITIVE LOGITS
ÂĢÂĻ
0.17
ruk
0.15
avia
0.15
iž
0.14
owi
0.14
inges
0.14
ako
0.14
éϽ
0.14
TZ
0.14
.tools
0.14
Activations Density 0.100%