INDEX
Explanations
phrases indicating growth or change in trends
New Auto-Interp
Negative Logits
anning
-0.17
azen
-0.16
ilde
-0.16
ettel
-0.15
icable
-0.15
VICE
-0.14
EqualTo
-0.14
erb
-0.14
ulfilled
-0.14
onz
-0.13
POSITIVE LOGITS
becoming
0.30
gaining
0.23
bec
0.20
Bec
0.19
one
0.19
seeing
0.17
among
0.17
ä¸Ģç§į
0.17
HOT
0.17
often
0.16
Activations Density 0.285%