INDEX
Explanations
phrases indicating contrast or concession
New Auto-Interp
Negative Logits
Portale
-0.70
PreferredItem
-0.69
findpost
-0.65
ModelExpression
-0.56
ainfi
-0.55
ItemBackground
-0.53
AndEndTag
-0.53
Jereo
-0.52
feroit
-0.52
Nuorodos
-0.50
POSITIVE LOGITS
Nevertheless
0.70
Nevertheless
0.70
nonetheless
0.70
Trotzdem
0.69
nevertheless
0.69
trotzdem
0.67
それでも
0.65
Dennoch
0.61
Nonetheless
0.61
still
0.61
Activations Density 0.366%