INDEX
Explanations
phrases related to opposing views or choices
concepts related to divergence and polarity in viewpoints
New Auto-Interp
Negative Logits
Annotations
-0.73
Delivery
-0.71
hey
-0.70
Regist
-0.64
Register
-0.64
Tickets
-0.63
urga
-0.62
Install
-0.62
ogo
-0.61
Emb
-0.60
POSITIVE LOGITS
oppos
1.08
sexes
1.07
genders
0.95
viewpoints
0.91
extremes
0.90
inational
0.86
twins
0.84
halves
0.84
disparate
0.84
perspectives
0.84
Activations Density 0.074%