INDEX
Explanations
phrases related to facing challenges or negative outcomes
phrases indicating significant actions or changes
New Auto-Interp
Negative Logits
ndra
-0.76
orth
-0.75
bably
-0.71
osity
-0.70
-+-+
-0.70
ulty
-0.70
arth
-0.67
anni
-0.67
legates
-0.66
apo
-0.66
POSITIVE LOGITS
seriously
0.98
lightly
0.87
aback
0.80
virginity
0.80
hostage
0.78
Seriously
0.77
stride
0.77
plunge
0.76
cue
0.76
reins
0.75
Activations Density 0.255%