INDEX
Explanations
words related to specific names or people
proper nouns, specifically names of people and locations
New Auto-Interp
Negative Logits
iery
-0.60
irens
-0.56
mble
-0.56
ãĥŁ
-0.55
à¨
-0.54
ß
-0.54
Calais
-0.54
Debor
-0.54
notation
-0.53
reviewed
-0.53
POSITIVE LOGITS
himself
0.99
's
0.94
realizes
0.78
knew
0.75
Himself
0.75
knows
0.74
â̲
0.73
herself
0.72
remembers
0.70
Sr
0.70
Activations Density 0.284%