INDEX
Explanations
instances of the letter "i" that are part of a specific pattern
references to the pronoun "I."
New Auto-Interp
Negative Logits
Down
-0.74
Territ
-0.73
ãĥ¼ãĥĨ
-0.73
rique
-0.72
Territories
-0.72
Rebels
-0.69
Reef
-0.66
lain
-0.66
Everett
-0.65
Valkyrie
-0.64
POSITIVE LOGITS
pec
1.15
ibo
1.10
ota
1.08
GPU
1.03
wi
1.00
ordan
0.97
omm
0.95
wb
0.93
ulia
0.92
'm
0.90
Activations Density 0.034%