INDEX
Explanations
adjectives conveying a strong and clear contrast
the word "stark" and its variations in contexts highlighting contrasts or extremes
New Auto-Interp
Negative Logits
uthor
-0.72
annis
-0.71
uters
-0.70
ipop
-0.70
diligently
-0.68
phis
-0.67
PU
-0.67
safely
-0.66
hops
-0.65
hemor
-0.64
POSITIVE LOGITS
contrasts
1.12
ly
1.04
contrast
1.03
stark
0.88
naked
0.82
est
0.81
iary
0.76
contrasting
0.75
difference
0.74
er
0.74
Activations Density 0.013%