INDEX
Explanations
special character sequences that likely represent formatting or encoding issues
New Auto-Interp
Negative Logits
ia
-0.19
j
-0.16
kla
-0.16
MLA
-0.15
akk
-0.14
ongyang
-0.14
alem
-0.14
ENA
-0.14
Starbucks
-0.14
aug
-0.13
POSITIVE LOGITS
Mos
0.28
Mos
0.21
bomber
0.18
bombers
0.17
Coastal
0.16
Bom
0.16
mos
0.16
ãĥ©ãĥĥãĤ¯
0.16
/os
0.15
moz
0.15
Activations Density 0.002%