INDEX
Explanations
phrases that indicate a beginning or initiation of events and activities
New Auto-Interp
Negative Logits
atra
-0.16
ODB
-0.15
rush
-0.15
aket
-0.15
ắt
-0.14
hb
-0.14
inha
-0.14
ssi
-0.14
.erb
-0.14
aka
-0.14
POSITIVE LOGITS
erset
0.15
uar
0.15
yr
0.15
Rocky
0.14
venir
0.14
eg
0.14
imes
0.14
::*
0.14
incl
0.14
Double
0.14
Activations Density 0.014%