INDEX
Explanations
words related to ideologies or belief systems
New Auto-Interp
Negative Logits
FACE
-0.80
ãģ®éŃĶ
-0.77
shown
-0.73
STON
-0.70
é¾
-0.68
MENTS
-0.68
zens
-0.64
ths
-0.64
ĸļ
-0.63
ãĥ¼ãĥ«
-0.61
POSITIVE LOGITS
ische
0.91
tendencies
0.86
extraord
0.82
ischer
0.82
otle
0.81
ophical
0.78
ribution
0.78
manifesto
0.76
ricting
0.75
emi
0.74
Activations Density 0.050%