INDEX
Explanations
phrases or words indicating opposition or contrast
phrases indicating contrast or opposition
New Auto-Interp
Negative Logits
ITED
-0.76
adows
-0.76
Beans
-0.75
Annotations
-0.73
lov
-0.72
Jump
-0.72
utters
-0.72
acca
-0.70
Mush
-0.70
beans
-0.70
POSITIVE LOGITS
side
0.98
sexes
0.98
sides
0.96
direction
0.87
sex
0.87
extremes
0.86
hemisphere
0.86
gender
0.82
poles
0.80
ends
0.77
Activations Density 0.045%