INDEX
Explanations
expressions of approval or positivity
New Auto-Interp
Negative Logits
disambiguazione
-0.71
Савезне
-0.68
informée
-0.64
étoit
-0.62
estekak
-0.60
avoient
-0.60
afficheront
-0.60
propOrder
-0.60
ujednoznacz
-0.59
parsedMessage
-0.58
POSITIVE LOGITS
Niche
0.72
niche
0.64
counters
0.58
dem
0.57
confidential
0.55
ache
0.54
catch
0.52
garage
0.52
green
0.51
pos
0.50
Activations Density 0.686%