INDEX
Explanations
statements expressing negativity or dislike
New Auto-Interp
Negative Logits
avoient
-0.49
étoit
-0.46
étoient
-0.45
ModelExpression
-0.45
otomatig
-0.42
disambiguazione
-0.41
kiệm
-0.41
verwijspagina
-0.41
auroit
-0.40
-0.40
POSITIVE LOGITS
ogóle
0.68
Totally
0.64
bentar
0.62
Totally
0.60
vůbec
0.59
affatto
0.59
Really
0.59
totally
0.59
totally
0.58
Really
0.57
Activations Density 0.004%