INDEX
Explanations
mentions of Islam and related terminology
New Auto-Interp
Negative Logits
manship
-0.17
umen
-0.17
ake
-0.16
ango
-0.15
aman
-0.15
/kernel
-0.15
째
-0.14
age
-0.14
ui
-0.14
Sen
-0.14
POSITIVE LOGITS
+:+
0.17
Č↵
0.16
.scalablytyped
0.16
->___
0.15
Binder
0.15
abyrinth
0.15
.synthetic
0.15
(SIG
0.14
afen
0.14
Binder
0.14
Activations Density 0.020%