INDEX
Explanations
words that signify changes over time or cause and effect relationships
"Then" or similar time-related words
New Auto-Interp
Negative Logits
vooraf
-0.79
pourtant
-0.72
recently
-0.71
Италијани
-0.70
lately
-0.70
تقاوى
-0.70
estekak
-0.69
recently
-0.68
estimés
-0.68
Italijanski
-0.68
POSITIVE LOGITS
proceed
1.12
proceeded
1.12
Then
0.93
further
0.93
Then
0.91
Далее
0.90
proceeds
0.90
then
0.89
Далее
0.88
THEN
0.88
Activations Density 0.620%