INDEX
Explanations
references to verses from religious texts
frequent references to biblical verses and passages
New Auto-Interp
Negative Logits
uggest
-0.74
behavi
-0.72
ecided
-0.67
ificant
-0.65
inarily
-0.65
athered
-0.64
dinand
-0.64
ificantly
-0.64
misunder
-0.63
uously
-0.62
POSITIVE LOGITS
00
0.96
58
0.92
53
0.90
59
0.90
30
0.89
20439
0.89
52
0.88
344
0.88
51
0.88
54
0.87
Activations Density 0.046%