INDEX
Explanations
phrases related to outcomes or consequences
instances of the phrase "end up with."
New Auto-Interp
Negative Logits
cent
-0.67
tenance
-0.67
former
-0.61
late
-0.61
she
-0.61
orah
-0.59
chan
-0.58
ati
-0.58
thing
-0.58
hops
-0.58
POSITIVE LOGITS
drawn
1.06
regards
1.01
regard
1.01
impunity
0.98
respect
0.98
draw
0.95
dignity
0.89
stood
0.83
standing
0.83
ãģĤ
0.82
Activations Density 0.130%