INDEX
Explanations
proper nouns associated with locations and people
New Auto-Interp
Negative Logits
terday
-0.69
expires
-0.66
wip
-0.65
Adin
-0.63
laureate
-0.62
contr
-0.60
diapers
-0.60
bottleneck
-0.59
TTL
-0.59
cipl
-0.59
POSITIVE LOGITS
orea
1.15
EEP
1.06
rieg
1.05
irk
1.05
essel
1.04
laus
1.04
ratom
1.01
eeper
1.00
won
0.99
ernel
0.99
Activations Density 0.047%