INDEX
Explanations
key phrases indicating actions, relationships, or characteristics relevant to significant societal issues
New Auto-Interp
Negative Logits
еÑĢеж
-0.15
peq
-0.15
wz
-0.15
ingham
-0.15
ocol
-0.14
_unused
-0.14
uces
-0.14
htable
-0.14
rades
-0.14
endi
-0.14
POSITIVE LOGITS
something
0.32
something
0.31
Something
0.27
Something
0.26
omething
0.20
nÄĽco
0.18
iets
0.18
etwas
0.17
ä½ķãģĭ
0.17
somehow
0.16
Activations Density 0.005%