INDEX
Explanations
words related to proper nouns and titles
New Auto-Interp
Negative Logits
autical
-0.70
departures
-0.68
olves
-0.67
alities
-0.65
products
-0.64
iculture
-0.64
ews
-0.63
acion
-0.61
interstitial
-0.61
ancial
-0.61
POSITIVE LOGITS
meanwhile
1.14
alas
1.09
incidentally
1.05
however
0.98
huh
0.94
moreover
0.93
bless
0.90
unsurprisingly
0.88
aka
0.85
beware
0.84
Activations Density 0.289%