INDEX
Explanations
dates or years in the 1900s
New Auto-Interp
Negative Logits
unts
-0.82
ditch
-0.78
allowance
-0.78
idle
-0.75
consistency
-0.75
hug
-0.73
extinct
-0.73
candy
-0.71
curb
-0.70
goodbye
-0.70
POSITIVE LOGITS
Previously
1.31
Its
1.31
Initially
1.29
Specifically
1.27
Originally
1.26
Essentially
1.26
Though
1.22
Located
1.22
It
1.21
Basically
1.20
Activations Density 1.930%