INDEX
Explanations
celebrity names, food-related terms, and words relating to leadership and military actions
New Auto-Interp
Negative Logits
Highlander
-0.69
WAYS
-0.69
press
-0.68
borg
-0.67
TERN
-0.67
metry
-0.65
xual
-0.64
pregnant
-0.64
fill
-0.63
SHIP
-0.62
POSITIVE LOGITS
inelli
1.17
irtual
1.04
ies
1.00
oice
1.00
okes
0.97
ying
0.97
oked
0.95
iley
0.94
okers
0.94
ille
0.94
Activations Density 2.688%