INDEX
Explanations
the word "our" with a high activation value
possessive pronouns indicating ownership or belonging
New Auto-Interp
Negative Logits
silent
-0.61
glitch
-0.60
missing
-0.57
haz
-0.56
nod
-0.56
crossover
-0.56
undead
-0.55
intellig
-0.55
losers
-0.55
loser
-0.55
POSITIVE LOGITS
our
4.54
ours
2.97
ouring
2.61
oured
2.54
OUR
2.54
orous
1.46
orously
1.41
ourn
1.37
ourage
1.29
bour
1.27
Activations Density 0.018%