INDEX
Explanations
phrases indicating the passage of time or events occurring
New Auto-Interp
Negative Logits
aldi
-0.15
ATUS
-0.15
ue
-0.14
iets
-0.14
alis
-0.14
uela
-0.13
tuto
-0.13
atus
-0.13
conco
-0.13
ensen
-0.13
POSITIVE LOGITS
follows
0.33
follow
0.27
follow
0.23
FOLLOW
0.21
Follow
0.21
Follow
0.20
days
0.19
following
0.19
_follow
0.19
heels
0.19
Activations Density 0.021%