INDEX
Explanations
references to religious figures and communities
New Auto-Interp
Negative Logits
ãĥ³ãĥĦ
-0.16
Laden
-0.15
iane
-0.14
ucu
-0.14
Metals
-0.14
rzy
-0.14
ubo
-0.13
udic
-0.13
.native
-0.13
Sabbath
-0.13
POSITIVE LOGITS
adel
0.16
spin
0.15
spun
0.14
Hosp
0.14
osp
0.14
abet
0.14
etas
0.14
strncpy
0.14
à¤łà¤¨
0.13
_iff
0.13
Activations Density 0.032%