INDEX
Explanations
expressions of fear or concern
New Auto-Interp
Negative Logits
çĦ
-0.79
brates
-0.72
pta
-0.71
eness
-0.71
slice
-0.69
cence
-0.68
nice
-0.68
Kinnikuman
-0.67
ety
-0.66
OVA
-0.66
POSITIVE LOGITS
retribution
1.11
repercussions
1.01
lest
0.97
repr
0.88
retaliation
0.85
pregn
0.78
imminent
0.77
impending
0.71
suicidal
0.71
unsafe
0.69
Activations Density 0.110%