INDEX
Explanations
phrases related to attempts or trying actions
instances of the word "attempted" or variations of it
New Auto-Interp
Negative Logits
spr
-0.81
hander
-0.80
minus
-0.76
front
-0.74
eyes
-0.73
houses
-0.73
sheet
-0.70
lined
-0.69
father
-0.67
scene
-0.65
POSITIVE LOGITS
unsuccessfully
1.10
ossibility
0.84
Attempts
0.77
heric
0.76
llor
0.75
URES
0.75
suicide
0.75
ossible
0.74
querque
0.73
resusc
0.72
Activations Density 0.026%