INDEX
Explanations
organized information about various societal issues and trends
New Auto-Interp
Negative Logits
(
-0.17
-0.16
yet
-0.16
had
-0.15
C
-0.15
rior
-0.15
as
-0.15
Roses
-0.15
Bro
-0.14
irc
-0.14
POSITIVE LOGITS
İY
0.18
$MESS
0.18
İÅŀ
0.17
NECT
0.17
égor
0.17
oldur
0.15
$LANG
0.15
estone
0.15
owie
0.15
swer
0.15
Activations Density 0.123%