INDEX
Explanations
a variety of text related to religion and atheism
New Auto-Interp
Negative Logits
tein
-0.63
lim
-0.61
NP
-0.60
ISA
-0.57
SN
-0.56
bow
-0.56
Lemon
-0.56
nen
-0.55
glas
-0.54
iman
-0.54
POSITIVE LOGITS
respectively
1.50
etc
1.47
etc
1.38
anything
0.94
alike
0.92
Lastly
0.92
whichever
0.91
depending
0.89
assorted
0.88
blah
0.81
Activations Density 5.030%