INDEX
Explanations
names of locations
punctuation marks, specifically periods
New Auto-Interp
Negative Logits
orical
-0.77
imates
-0.73
INAL
-0.66
oric
-0.66
olate
-0.63
imus
-0.63
okin
-0.61
omed
-0.61
ictional
-0.60
lifes
-0.60
POSITIVE LOGITS
CLASSIFIED
0.83
etc
0.79
etc
0.78
EntityItem
0.74
soever
0.71
Bethlehem
0.69
ternity
0.69
ooters
0.68
taboola
0.66
Boone
0.66
Activations Density 0.030%