INDEX
Explanations
phrases that indicate habitual actions or common occurrences
New Auto-Interp
Negative Logits
!("{-0.63
Kingston
-0.60
Kingston
-0.57
Jeografia
-0.54
zostało
-0.54
alex
-0.53
Maxine
-0.52
ειν
-0.52
fæ
-0.51
ХА
-0.51
POSITIVE LOGITS
usually
1.08
generally
1.03
generally
1.03
Generally
1.02
Generally
1.02
Usually
1.02
Usually
1.00
ordinarily
0.98
Normally
0.94
normally
0.94
Activations Density 0.153%