INDEX
Explanations
words and phrases reflecting intelligence, understanding, and societal critiques
New Auto-Interp
Negative Logits
ţion
-0.42
navideña
-0.40
plazos
-0.37
сылкі
-0.36
invitado
-0.35
Dış
-0.35
afficheront
-0.35
drying
-0.34
noDo
-0.34
lső
-0.34
POSITIVE LOGITS
stupidity
0.84
stupid
0.73
Stupid
0.71
stupid
0.68
Stupid
0.67
incompetence
0.65
imbec
0.64
morons
0.63
dumb
0.63
idiots
0.62
Activations Density 0.437%