INDEX
Explanations
phrases indicating personal opinions, beliefs, or thoughts
pronouns, particularly those referring to people, in context
New Auto-Interp
Negative Logits
odder
-0.65
Hicks
-0.64
ardon
-0.63
epad
-0.61
Bots
-0.59
Cunningham
-0.59
İĭ
-0.59
OTOS
-0.57
inational
-0.57
Bland
-0.57
POSITIVE LOGITS
fared
0.91
handled
0.81
'd
0.81
've
0.81
fucked
0.80
stacked
0.78
cope
0.76
relates
0.76
handle
0.75
behave
0.75
Activations Density 0.113%