INDEX
Explanations
phrases related to not giving up or showing determination
expressions indicating resistance or retreating behavior
New Auto-Interp
Negative Logits
marks
-0.72
usable
-0.67
tained
-0.64
Fam
-0.64
anon
-0.63
lain
-0.61
burning
-0.59
frey
-0.59
marked
-0.59
oret
-0.59
POSITIVE LOGITS
blindly
0.81
timid
0.80
inaction
0.76
cowardly
0.75
olicy
0.74
conced
0.74
retreat
0.72
retreating
0.71
apologise
0.71
hesitate
0.71
Activations Density 0.116%