INDEX
Explanations
phrases related to negative emotions and personal struggles
expressions of emotional distress and confusion
New Auto-Interp
Negative Logits
ortium
-0.62
ertodd
-0.53
inances
-0.50
artney
-0.50
Preferred
-0.49
senal
-0.48
igham
-0.47
referen
-0.47
chronological
-0.46
aeper
-0.46
POSITIVE LOGITS
Semitism
0.52
inflicted
0.50
flares
0.50
amaz
0.49
!]
0.47
motives
0.47
spilled
0.46
inability
0.46
thirst
0.46
uncontroll
0.45
Activations Density 1.755%