INDEX
Explanations
references to newspapers and journalistic content
New Auto-Interp
Negative Logits
ument
-0.19
erez
-0.18
Binder
-0.16
ìĦł
-0.15
et
-0.15
ìļ´ëį°
-0.14
hay
-0.14
tum
-0.13
way
-0.13
ouver
-0.13
POSITIVE LOGITS
ä»Ķ
0.17
raith
0.15
isyon
0.15
ulance
0.14
θεν
0.14
оиÑĤ
0.14
peon
0.13
reek
0.13
Sonata
0.13
centage
0.13
Activations Density 0.009%