INDEX
Explanations
text related to philosophical, political, or historical contexts
New Auto-Interp
Negative Logits
predec
-0.70
buggy
-0.70
scatter
-0.67
stricken
-0.66
lodging
-0.65
shroud
-0.64
clad
-0.64
neglig
-0.63
closest
-0.63
decomp
-0.63
POSITIVE LOGITS
º
1.23
£
1.09
¹
1.07
¡
0.94
®
0.94
į
0.91
¬
0.91
»
0.90
Ĵ
0.89
Ń
0.88
Activations Density 0.236%