INDEX
Explanations
references to masks and their political implications
New Auto-Interp
Negative Logits
deniz
-0.18
ensen
-0.17
ipel
-0.15
CEED
-0.15
alse
-0.15
redentials
-0.15
atch
-0.15
istrovstvÃŃ
-0.14
aliz
-0.14
mant
-0.14
POSITIVE LOGITS
rips
0.15
oux
0.14
iji
0.14
ous
0.14
815
0.14
quis
0.13
_dispatcher
0.13
685
0.13
akan
0.13
nez
0.13
Activations Density 0.118%