INDEX
Explanations
pronouns and possessive pronouns referring to people
references to individuals in various contexts
New Auto-Interp
Negative Logits
Amen
-0.81
..."
-0.68
catentry
-0.66
isher
-0.64
Untitled
-0.63
venge
-0.62
)"
-0.60
}}
-0.59
)",
-0.59
limits
-0.58
POSITIVE LOGITS
've
1.00
'd
0.94
'll
0.83
're
0.80
pherd
0.78
didn
0.78
certainly
0.78
miah
0.76
grew
0.75
pard
0.73
Activations Density 0.305%