INDEX
Explanations
terms and phrases indicating neutrality or a lack of bias
New Auto-Interp
Negative Logits
expandindo
-0.85
devenus
-0.77
sanguí
-0.72
rencont
-0.71
africains
-0.69
Advancement
-0.69
Complexity
-0.68
loem
-0.68
примеча
-0.68
ActionCreators
-0.67
POSITIVE LOGITS
neutral
3.62
Neutral
3.26
neutral
3.18
Neutral
3.13
UTRAL
2.69
neutr
2.58
нейтра
2.52
neutrality
2.51
neutrals
2.50
neutre
2.40
Activations Density 0.103%