INDEX
Explanations
words indicating temporal relationships and conditions
New Auto-Interp
Negative Logits
asca
-0.16
erde
-0.15
Ly
-0.15
ahkan
-0.15
unn
-0.15
Pant
-0.15
пÑĢиклад
-0.15
Ly
-0.15
okrat
-0.14
Walton
-0.14
POSITIVE LOGITS
oba
0.17
ujet
0.16
ushman
0.15
Somerset
0.15
inde
0.15
ighton
0.14
spol
0.14
usu
0.14
ighting
0.14
oby
0.14
Activations Density 0.043%