INDEX
Explanations
mentions of the term "robot"
words related to various forms of emotional states or feelings
New Auto-Interp
Negative Logits
Spread
-0.77
variance
-0.72
lapt
-0.68
ADRA
-0.67
indo
-0.65
envy
-0.65
affinity
-0.64
Spoiler
-0.62
divorced
-0.62
experien
-0.61
POSITIVE LOGITS
velt
0.85
rences
0.84
monary
0.80
warts
0.79
merce
0.77
ingo
0.76
enthal
0.76
oscopic
0.74
cair
0.73
estern
0.72
Activations Density 0.157%