INDEX
Explanations
references to religious figures and their roles within the church
New Auto-Interp
Negative Logits
ea
-0.15
lesh
-0.14
cio
-0.14
ohl
-0.14
eil
-0.14
ãİ¡
-0.14
rew
-0.14
punch
-0.13
endif
-0.13
atters
-0.13
POSITIVE LOGITS
Ŀ
0.16
ostel
0.15
onyms
0.15
assin
0.14
rosis
0.14
ylon
0.14
alary
0.14
ìĥ
0.14
onym
0.14
Kavanaugh
0.13
Activations Density 0.026%