INDEX
Explanations
phrases indicating a quantitative increase or upward movement
New Auto-Interp
Negative Logits
ched
-0.16
ingt
-0.16
ace
-0.15
ecd
-0.15
ingly
-0.15
oral
-0.15
ystone
-0.14
tempts
-0.14
auer
-0.14
rej
-0.14
POSITIVE LOGITS
otre
0.23
ward
0.20
-to
0.20
wards
0.19
rightness
0.18
ToDate
0.17
oload
0.17
holds
0.16
ozilla
0.16
Ward
0.16
Activations Density 0.032%