INDEX
Explanations
verbs indicating past events or actions
New Auto-Interp
Negative Logits
ople
-0.69
ement
-0.64
overpowered
-0.63
amn
-0.62
pack
-0.61
overpower
-0.61
stalls
-0.60
fold
-0.59
tsy
-0.59
arte
-0.59
POSITIVE LOGITS
Lastly
1.14
Similarly
1.13
Similarly
1.13
Likewise
0.97
Lastly
0.93
Finally
0.90
Likewise
0.87
Meanwhile
0.85
Another
0.82
Others
0.82
Activations Density 0.589%