INDEX
Explanations
phrases related to contrasting choices or actions
phrases indicating exclusivity or alternatives
New Auto-Interp
Negative Logits
oro
-0.69
amon
-0.67
},"
-0.63
rament
-0.62
eg
-0.61
âĿ
-0.61
ender
-0.61
Sunshine
-0.60
atur
-0.60
izens
-0.59
POSITIVE LOGITS
preferring
0.91
implying
0.84
suggesting
0.84
culminating
0.79
skipping
0.77
christ
0.76
guaranteeing
0.76
excluding
0.76
allowing
0.74
adolesc
0.72
Activations Density 0.361%