INDEX
Explanations
references to religious texts and concepts
New Auto-Interp
Negative Logits
845
-0.15
Hussein
-0.14
beg
-0.14
945
-0.14
repl
-0.14
trif
-0.14
bard
-0.14
ella
-0.14
tea
-0.13
abad
-0.13
POSITIVE LOGITS
Sur
0.19
اÙĦبÙĦد
0.19
sur
0.19
-regexp
0.18
-sur
0.17
úb
0.17
'gc
0.17
Sur
0.16
оÑģп
0.15
chrome
0.15
Activations Density 0.012%