INDEX
Explanations
numerical values and symbols in the text
New Auto-Interp
Negative Logits
amas
-0.85
ames
-0.84
ipal
-0.82
ire
-0.75
iph
-0.75
oms
-0.73
ops
-0.73
ilty
-0.71
eree
-0.71
osh
-0.71
POSITIVE LOGITS
Throughout
1.03
Later
1.02
Eventually
1.01
Nevertheless
0.99
Secondly
0.99
Initially
0.98
Earlier
0.98
Anyway
0.97
Shortly
0.95
Now
0.95
Activations Density 0.154%