INDEX
Explanations
references to authoritative religious beliefs and critiques of religion
New Auto-Interp
Head Attr Weights
0:0.06
1:0.03
2:0.06
3:0.12
4:0.05
5:0.07
6:0.02
7:0.03
8:0.06
9:0.15
10:0.21
11:0.07
Negative Logits
Prompt
-1.24
showc
-1.21
ransomware
-1.19
ALS
-1.18
senal
-1.17
Slug
-1.15
shapeshifter
-1.15
prank
-1.15
Cummings
-1.14
newsp
-1.14
POSITIVE LOGITS
precept
1.45
leness
1.44
subord
1.44
utopian
1.40
earthly
1.36
growth
1.35
societies
1.33
democracies
1.32
trillions
1.29
).[
1.27
Activations Density 0.914%