INDEX
Explanations
mentions of particular individuals or works
New Auto-Interp
Negative Logits
Ïĥκε
-0.17
odash
-0.16
ANCELED
-0.16
.errors
-0.15
nze
-0.14
merce
-0.14
ãģĨãģ¡
-0.14
íļĮìĤ¬
-0.14
ãģ«ãģ¦
-0.13
91
-0.13
POSITIVE LOGITS
çĶŁçļĦ
0.31
Fs
0.31
Ps
0.29
æł·çļĦ
0.29
ys
0.29
Gs
0.28
人çļĦ
0.28
好çļĦ
0.27
Ns
0.27
ä¸ĬçļĦ
0.27
Activations Density 0.837%