INDEX
Explanations
references to sacred or sacrilegious concepts
New Auto-Interp
Negative Logits
ously
-0.77
ITNESS
-0.76
çīĪ
-0.75
DonaldTrump
-0.72
Clarkson
-0.69
Sawyer
-0.67
Palmer
-0.65
EEP
-0.63
Philips
-0.63
!/
-0.62
POSITIVE LOGITS
rament
1.08
het
1.07
ificial
1.03
cer
0.97
cery
0.95
rum
0.94
culus
0.94
hem
0.90
char
0.90
hest
0.89
Activations Density 0.083%