INDEX
Explanations
strong emotional responses or actions
New Auto-Interp
Negative Logits
_>
-0.66
withdrawn
-0.60
alternatives
-0.60
withdrawing
-0.59
guid
-0.58
latitude
-0.58
commit
-0.57
consulted
-0.56
wielding
-0.55
admin
-0.55
POSITIVE LOGITS
soType
0.74
nerves
0.72
tails
0.67
yout
0.67
tail
0.67
terness
0.62
netflix
0.62
uates
0.62
¿½
0.61
morale
0.61
Activations Density 0.218%