INDEX
Explanations
references to religious institutions, specifically churches
New Auto-Interp
Negative Logits
eel
-0.18
ãģĬãĤĬ
-0.15
Ø«
-0.15
orent
-0.14
urity
-0.14
.infinity
-0.14
ook
-0.14
obus
-0.14
Specialists
-0.14
uary
-0.14
POSITIVE LOGITS
yard
0.21
lett
0.18
(es
0.17
zeitig
0.16
erm
0.16
worm
0.16
going
0.15
rooms
0.15
TokenType
0.15
wide
0.15
Activations Density 0.028%