INDEX
Explanations
mentions of particular states or locations
the word "the" in various contexts throughout the document
New Auto-Interp
Negative Logits
tons
-0.68
packages
-0.68
artifacts
-0.65
forces
-0.65
pee
-0.64
stuff
-0.62
.—
-0.61
lift
-0.60
ansas
-0.59
pees
-0.58
POSITIVE LOGITS
outset
1.34
moment
1.21
same
1.18
behest
1.14
end
1.14
beginning
1.03
forefront
0.99
conclusion
0.99
height
0.92
expense
0.91
Activations Density 0.055%