INDEX
Explanations
instances of capitalization, likely focusing on proper nouns or significant terms
New Auto-Interp
Negative Logits
nen
-0.19
li
-0.18
nes
-0.17
nees
-0.17
ss
-0.17
lo
-0.17
rd
-0.17
º
-0.17
rome
-0.16
loe
-0.15
POSITIVE LOGITS
izabeth
0.17
SCO
0.16
Åijs
0.15
ichel
0.15
-disable
0.15
Ãłnh
0.15
/disable
0.15
erald
0.14
zzo
0.14
enis
0.14
Activations Density 0.147%