INDEX
Explanations
sentences ending with "said"
special characters or symbols in the text
New Auto-Interp
Negative Logits
decomp
-0.85
Marble
-0.76
Myster
-0.75
Discord
-0.73
Nept
-0.72
Voyager
-0.71
Manhattan
-0.70
clutter
-0.70
gray
-0.69
warp
-0.68
POSITIVE LOGITS
¬
0.97
¹
0.97
£
0.96
į
0.95
Asia
0.92
ais
0.92
Į
0.90
AFP
0.90
Iraq
0.87
ech
0.87
Activations Density 0.427%