INDEX
Explanations
patterns of words indicating comparisons or references to groups
New Auto-Interp
Negative Logits
umpt
-0.17
globals
-0.15
-li
-0.15
ollar
-0.15
ÄĻk
-0.15
loh
-0.14
orra
-0.14
aeda
-0.14
queues
-0.14
èĢ
-0.14
POSITIVE LOGITS
us
0.28
them
0.20
.us
0.17
ender
0.16
aze
0.15
(us
0.14
ssi
0.14
Us
0.14
igin
0.14
455
0.14
Activations Density 0.064%