INDEX
Explanations
phrases that indicate conditions or situations related to change or development
New Auto-Interp
Negative Logits
ɵ
-0.16
indo
-0.15
ouro
-0.15
ewart
-0.14
conds
-0.13
igest
-0.13
stitute
-0.13
orners
-0.13
ufen
-0.13
aska
-0.13
POSITIVE LOGITS
arken
0.17
isine
0.17
beck
0.15
mere
0.15
istrovstvÃŃ
0.15
addock
0.14
robat
0.14
ebek
0.14
attice
0.13
usan
0.13
Activations Density 0.146%