INDEX
Explanations
proper nouns
references to the word "Ba" followed by various numerical identifiers or context descriptors
New Auto-Interp
Negative Logits
pants
-0.79
lessly
-0.76
wise
-0.74
otle
-0.68
address
-0.67
REDACTED
-0.67
ments
-0.64
dress
-0.60
weed
-0.60
Leone
-0.60
POSITIVE LOGITS
uble
1.20
uman
1.14
iley
1.12
umann
1.11
plin
1.07
iting
1.03
atar
1.01
ÅŁ
0.99
ñ
0.99
um
0.97
Activations Density 0.026%