INDEX
Explanations
personal pronouns followed by verbs or possessive pronouns
pronouns referring to individuals
New Auto-Interp
Negative Logits
earch
-0.72
arsity
-0.67
atlantic
-0.64
aughtered
-0.59
iaz
-0.57
Gulf
-0.55
cyclop
-0.55
ãĥ¬
-0.54
itol
-0.54
atory
-0.53
POSITIVE LOGITS
'll
1.04
've
1.03
'd
0.96
're
0.92
knew
0.81
adore
0.79
despise
0.73
cannot
0.73
self
0.71
can
0.71
Activations Density 0.646%