INDEX
Explanations
names of places
the repetitive use of the word "are."
New Auto-Interp
Negative Logits
ured
-0.70
insula
-0.67
urers
-0.65
omez
-0.65
ingen
-0.65
ulators
-0.63
uring
-0.61
isively
-0.59
shire
-0.58
BD
-0.57
POSITIVE LOGITS
lli
1.36
tto
1.32
tta
1.30
nce
1.18
tsky
1.17
ndra
1.12
llo
1.11
nces
1.09
tti
1.09
tt
1.08
Activations Density 0.048%