INDEX
Explanations
expressions indicating the possibility or inevitability of negative outcomes
New Auto-Interp
Negative Logits
findpost
-0.74
betweenstory
-0.69
featureID
-0.68
PreferredItem
-0.67
nakalista
-0.63
ainfi
-0.62
ujednoznacz
-0.61
Италијани
-0.60
tartalomajánló
-0.58
Administrativna
-0.57
POSITIVE LOGITS
tetap
0.55
sekal
0.50
still
0.43
still
0.42
ändå
0.41
comunque
0.40
trotzdem
0.40
Dennoch
0.38
Still
0.37
それでも
0.37
Activations Density 0.381%