INDEX
Explanations
phrases related to alternative options or choices
references to alternative options or viewpoints
New Auto-Interp
Negative Logits
hips
-0.98
haw
-0.84
older
-0.80
girls
-0.74
ahon
-0.74
encers
-0.73
artney
-0.72
raq
-0.71
Saud
-0.71
ochet
-0.70
POSITIVE LOGITS
Altern
1.00
alternative
0.96
alternatives
0.94
atives
0.88
solutions
0.83
viewpoints
0.82
options
0.78
explanations
0.78
therapies
0.76
perspectives
0.76
Activations Density 0.017%