INDEX
Explanations
phrases indicating a duration of time or periods of time
New Auto-Interp
Negative Logits
ownik
-0.18
оваÑĢи
-0.16
éŀ
-0.15
rary
-0.15
onium
-0.14
µľ
-0.14
ekil
-0.14
Setter
-0.14
inesis
-0.14
ugh
-0.14
POSITIVE LOGITS
initial
0.21
struggle
0.18
YPE
0.18
failed
0.18
awhile
0.17
initially
0.17
uria
0.17
deliber
0.17
false
0.16
successful
0.16
Activations Density 0.109%