INDEX
Explanations
references to sacred or religious themes
references to sacrilege or sacrificial concepts
New Auto-Interp
Negative Logits
Roosevelt
-0.67
onward
-0.64
Wonderland
-0.63
Clarkson
-0.63
Sutherland
-0.62
DonaldTrump
-0.62
Sawyer
-0.62
ocating
-0.62
ously
-0.62
onwards
-0.62
POSITIVE LOGITS
char
1.26
rifice
1.18
rament
1.15
het
1.09
ros
1.07
rast
1.06
hem
1.03
entric
1.03
hemy
1.02
onduct
1.01
Activations Density 0.053%