INDEX
Explanations
references to or mentions of people, specifically with words like "someone," "somebody," or "someone" in various contexts
references to undefined or vague individuals
New Auto-Interp
Negative Logits
tnc
-0.93
UV
-0.71
ean
-0.70
osterone
-0.69
iven
-0.66
Rom
-0.66
Vert
-0.65
urer
-0.64
effect
-0.63
imon
-0.62
POSITIVE LOGITS
else
1.42
Else
1.07
stole
1.01
else
0.98
forgot
0.94
Else
0.91
intervened
0.83
coined
0.83
wants
0.81
wrote
0.81
Activations Density 0.104%