INDEX
Explanations
pronouns indicating possession
possessive pronouns indicating ownership or belonging
New Auto-Interp
Negative Logits
gravity
-0.74
establishment
-0.70
positive
-0.69
iple
-0.67
erate
-0.67
Edge
-0.63
lined
-0.63
edd
-0.62
orsi
-0.62
trump
-0.62
POSITIVE LOGITS
selves
1.04
self
0.90
creen
0.74
uria
0.71
ovie
0.69
RPG
0.68
hers
0.67
tam
0.65
theirs
0.64
ours
0.64
Activations Density 0.013%