INDEX
Explanations
positive and negative sentiments towards various topics or entities
phrases reflecting attitudes and evaluations, particularly about kindness and negativity
New Auto-Interp
Negative Logits
UNCH
-0.78
ĸļ
-0.73
impossibility
-0.73
utter
-0.70
igsaw
-0.68
rame
-0.66
ulo
-0.62
alsa
-0.60
igs
-0.60
rossover
-0.60
POSITIVE LOGITS
toward
1.16
towards
1.12
Towards
0.86
disposed
0.85
relations
0.85
itism
0.82
attitude
0.77
gays
0.73
Semitic
0.71
demeanor
0.70
Activations Density 0.440%