INDEX
Explanations
mentions of various religious denominations and church affiliations
New Auto-Interp
Negative Logits
åĽº
-0.19
oller
-0.17
esel
-0.16
hots
-0.15
,[],
-0.15
ections
-0.14
_compiler
-0.14
hal
-0.13
ickets
-0.13
оÑĩно
-0.13
POSITIVE LOGITS
dom
0.16
/wiki
0.15
dehyde
0.15
OPY
0.15
Walls
0.14
/forum
0.14
اÙĪÙĬØ©
0.14
assin
0.14
Ire
0.14
vent
0.14
Activations Density 0.011%