INDEX
Explanations
words related to facial expressions or emotions
words and phrases related to physical body parts or actions
New Auto-Interp
Negative Logits
OTOS
-0.81
IRE
-0.63
LV
-0.60
aurus
-0.60
NL
-0.59
Jav
-0.58
Games
-0.57
alogue
-0.57
odium
-0.56
RIP
-0.55
POSITIVE LOGITS
ingly
0.89
puff
0.83
buck
0.83
abouts
0.81
legged
0.77
yip
0.76
ing
0.73
edin
0.73
toe
0.72
footed
0.71
Activations Density 0.118%