INDEX
Explanations
references to individuals named Muhammad or related variants in a political context
New Auto-Interp
Negative Logits
isin
-0.16
Samar
-0.16
ulton
-0.16
acing
-0.14
pedia
-0.14
serrat
-0.14
AZE
-0.14
diffs
-0.14
pcs
-0.14
Garrison
-0.14
POSITIVE LOGITS
oph
0.17
u
0.16
ÙİØ§
0.16
bu
0.16
uÄį
0.15
strup
0.15
awi
0.14
umm
0.14
ÑĩиÑĤ
0.14
Ñĥ
0.14
Activations Density 0.008%