INDEX
Explanations
phrases indicating a change or development happening up to a certain point in time
phrases indicating change or continuity over time
New Auto-Interp
Negative Logits
Tears
-0.65
``
-0.65
Learns
-0.64
Benefits
-0.61
Choice
-0.61
lez
-0.61
umbn
-0.60
Extension
-0.57
BuyableInstoreAndOnline
-0.56
0010
-0.55
POSITIVE LOGITS
unsus
0.74
unnoticed
0.69
belonged
0.68
DEN
0.68
hadn
0.66
unexplained
0.65
onwards
0.65
hasn
0.63
unrem
0.62
unknown
0.61
Activations Density 0.053%