INDEX
Explanations
terms related to combating negative effects or risks to health and the environment
New Auto-Interp
Negative Logits
haled
-0.17
oser
-0.17
elves
-0.16
füh
-0.16
ehler
-0.16
ollar
-0.15
LEAR
-0.15
ulings
-0.14
açı
-0.14
uras
-0.14
POSITIVE LOGITS
ewear
0.15
Jvm
0.15
ong
0.14
Ùħس
0.14
ysi
0.14
catch
0.14
çĥĪ
0.13
Acquisition
0.13
nhấn
0.13
542
0.13
Activations Density 0.648%