INDEX
Explanations
words related to actions that involve interaction or communication with others
occurrences of the article "a"
New Auto-Interp
Negative Logits
aim
-0.63
Encyclopedia
-0.63
insult
-0.63
LSD
-0.60
Illum
-0.59
excuse
-0.58
ECB
-0.58
easy
-0.58
},{"-0.56
opium
-0.56
POSITIVE LOGITS
ria
0.78
guiActiveUn
0.78
plin
0.74
riel
0.74
rial
0.74
lex
0.73
aron
0.71
UTH
0.70
ilee
0.69
UG
0.69
Activations Density 0.146%