INDEX
Explanations
phrases related to societal issues and global politics
the presence of specific symbols or characters in the text
New Auto-Interp
Negative Logits
buggy
-0.74
decomp
-0.71
scattering
-0.71
scatter
-0.70
smokes
-0.69
anwhile
-0.69
glers
-0.68
rooting
-0.68
lodging
-0.68
dumping
-0.67
POSITIVE LOGITS
£
1.09
º
0.97
¹
0.94
âĹ
0.89
Serv
0.88
»
0.86
®
0.86
¡
0.86
Hon
0.86
âĢº
0.83
Activations Density 0.266%