INDEX
Explanations
phrases related to motivations, decisions, and achievements
special characters or unusual formatting in the text
New Auto-Interp
Negative Logits
scatter
-0.59
decomp
-0.52
cyan
-0.51
scattering
-0.51
buggy
-0.51
shack
-0.50
bed
-0.49
Nib
-0.48
radar
-0.47
coast
-0.47
POSITIVE LOGITS
¹
0.85
£
0.83
ı
0.79
º
0.79
¡
0.78
Ĵ
0.78
ł
0.77
į
0.76
¬
0.72
§
0.72
Activations Density 0.493%