INDEX
Explanations
proper nouns involving names
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
nesday
-0.72
ankind
-0.62
aukee
-0.60
eatures
-0.59
remem
-0.55
ascus
-0.55
mercial
-0.54
glers
-0.54
ippi
-0.54
urrencies
-0.53
POSITIVE LOGITS
ansas
0.82
itect
0.79
inian
0.76
Cortex
0.70
ondo
0.68
agos
0.67
awi
0.66
INAL
0.65
ements
0.64
Spoiler
0.63
Activations Density 0.070%