INDEX
Explanations
references to religion
references to religion
New Auto-Interp
Negative Logits
kus
-0.83
Lans
-0.70
berries
-0.66
ptives
-0.66
ription
-0.66
hyde
-0.66
berry
-0.66
apple
-0.65
Rober
-0.63
Lag
-0.62
POSITIVE LOGITS
ophobia
1.02
ophobic
0.95
ophob
0.88
istical
0.88
fulness
0.85
hood
0.84
affiliation
0.82
ically
0.81
ists
0.80
tenets
0.79
Activations Density 0.039%