INDEX
Explanations
proper names, particularly surnames
names and terms associated with specific political figures and entities
New Auto-Interp
Negative Logits
Tsukuyomi
-0.75
Reviewer
-0.71
Metatron
-0.70
Curiosity
-0.69
tails
-0.69
Monarch
-0.69
jaws
-0.69
Archangel
-0.68
icing
-0.67
Merlin
-0.67
POSITIVE LOGITS
kamp
1.66
ervative
0.86
liga
0.86
ervatives
0.86
ensation
0.85
stad
0.82
artisan
0.82
atism
0.82
sburg
0.81
fort
0.79
Activations Density 0.013%