INDEX
Explanations
words related to temporal sequences or events that occur prior to others
New Auto-Interp
Negative Logits
habet
-0.83
__":
-0.80
ſelf
-0.79
KommentareTeilen
-0.73
gany
-0.71
Jefus
-0.71
ษัท
-0.69
%)$
-0.68
gdx
-0.68
følge
-0.65
POSITIVE LOGITS
before
1.96
before
1.82
Before
1.71
BEFORE
1.70
BEFORE
1.67
Before
1.63
sebelum
1.59
befo
1.37
antes
1.35
πριν
1.35
Activations Density 0.694%