INDEX
Explanations
personal pronouns and possessive pronouns, specifically focusing on instances related to the speaker or writer
New Auto-Interp
Negative Logits
ories
-0.74
iston
-0.63
°
-0.61
Centers
-0.60
incorporation
-0.59
orie
-0.59
majority
-0.58
ftime
-0.57
profits
-0.57
nce
-0.56
POSITIVE LOGITS
personally
1.06
lees
0.88
adows
0.87
atic
0.84
somew
0.79
andering
0.78
verbally
0.75
erk
0.74
orally
0.73
uncond
0.72
Activations Density 0.746%