INDEX
Explanations
phrases or terms related to critical socio-political discourse
New Auto-Interp
Negative Logits
fortun
-0.91
sacrific
-0.88
mathemat
-0.84
myster
-0.84
comr
-0.79
disadvant
-0.78
suspic
-0.77
notor
-0.75
hurd
-0.75
cryst
-0.73
POSITIVE LOGITS
ï¸ı
1.32
ski
1.11
tu
0.95
tal
0.95
tre
0.94
sky
0.93
sic
0.92
tsy
0.91
heim
0.90
ti
0.90
Activations Density 0.150%