INDEX
Explanations
words related to comparisons or contrasts
New Auto-Interp
Negative Logits
hood
-0.70
sat
-0.69
SPONSORED
-0.64
³³³³³³³³
-0.63
deen
-0.63
fair
-0.62
sylvania
-0.62
mouth
-0.62
sov
-0.60
she
-0.59
POSITIVE LOGITS
regards
2.02
regard
1.90
respect
1.52
draw
1.47
stood
1.44
standing
1.27
drawn
1.22
holding
1.19
impunity
1.12
hindsight
1.03
Activations Density 0.189%