INDEX
Explanations
statements asserting the existence or presence of something
New Auto-Interp
Negative Logits
éĤ£äºĽ
-0.17
few
-0.16
všechny
-0.15
λικά
-0.14
chúng
-0.14
ucc
-0.14
two
-0.14
itol
-0.14
ÑįÑĤи
-0.14
headaches
-0.13
POSITIVE LOGITS
room
0.23
evidence
0.20
spou
0.20
talk
0.20
footage
0.16
ROOM
0.16
variability
0.15
mention
0.15
Room
0.15
plenty
0.15
Activations Density 0.107%