INDEX
Explanations
phrases related to religious beliefs
references to beliefs and their implications
New Auto-Interp
Negative Logits
ptives
-0.74
neau
-0.70
cephal
-0.66
Delivery
-0.66
fac
-0.63
Naz
-0.62
ded
-0.62
BUG
-0.62
othy
-0.61
sg
-0.61
POSITIVE LOGITS
linger
1.06
mith
0.94
hips
0.92
ettings
0.89
omething
0.88
cape
0.85
beliefs
0.85
ynski
0.85
uits
0.85
hip
0.84
Activations Density 0.059%