INDEX
Explanations
references to religious practices and institutions
New Auto-Interp
Negative Logits
TL
-0.16
uong
-0.15
isci
-0.15
alli
-0.15
opez
-0.15
hoa
-0.14
knull
-0.14
uco
-0.14
croft
-0.14
747
-0.14
POSITIVE LOGITS
ÑĢаÐ
0.17
apan
0.16
åĴ²
0.15
AUSE
0.15
aina
0.14
éŀ
0.14
Gingrich
0.14
Minor
0.14
/gtest
0.14
Bain
0.14
Activations Density 0.301%