INDEX
Explanations
words related to political or ideological concepts, particularly those involving strong opinions or stances
instances of the character "ĺ"
New Auto-Interp
Negative Logits
Seym
-0.77
enegger
-0.71
mathemat
-0.71
trainers
-0.68
therap
-0.67
Niet
-0.66
è¦ļéĨĴ
-0.66
intrins
-0.66
Reincarn
-0.66
ivory
-0.66
POSITIVE LOGITS
ï¸ı
1.07
lean
0.93
log
0.87
ATH
0.83
£
0.82
ĺ
0.81
âĹ¼
0.80
fter
0.79
resent
0.79
leans
0.78
Activations Density 0.037%