INDEX
Explanations
phrases related to opposing viewpoints or controversies
New Auto-Interp
Negative Logits
xus
-0.89
frac
-0.73
Caption
-0.72
kamp
-0.70
ilit
-0.69
uez
-0.68
ventus
-0.66
nect
-0.65
renheit
-0.64
idation
-0.64
POSITIVE LOGITS
sexes
1.41
genders
1.18
sides
1.17
halves
0.92
domestically
0.82
orally
0.81
verbally
0.80
individually
0.74
internally
0.74
anecd
0.71
Activations Density 0.045%