INDEX
Explanations
alternative options or solutions to existing choices
references to alternatives in various contexts
New Auto-Interp
Negative Logits
hips
-0.96
older
-0.80
haw
-0.77
encers
-0.75
ochet
-0.70
girls
-0.70
artney
-0.70
Saud
-0.68
awar
-0.68
Chicken
-0.67
POSITIVE LOGITS
alternative
1.05
Altern
0.99
alternatives
0.97
atives
0.86
solutions
0.84
viewpoints
0.84
options
0.79
altern
0.79
Alternative
0.79
explanations
0.76
Activations Density 0.015%