INDEX
Explanations
affirmations or expressions of agreement
New Auto-Interp
Negative Logits
Bul
-0.78
bul
-0.70
"..\..\
-0.67
mapsto
-0.66
</em>
-0.66
Bul
-0.66
Koz
-0.65
dal
-0.65
presence
-0.64
Rox
-0.64
POSITIVE LOGITS
Agree
2.03
agrees
1.89
agree
1.84
agree
1.75
Agree
1.70
Disagree
1.67
agreeing
1.66
Agreed
1.65
Agre
1.64
AGRE
1.64
Activations Density 0.087%