INDEX
Explanations
words related to intent or attempt
the repeated phrase "trying to."
New Auto-Interp
Negative Logits
dylib
-0.72
thus
-0.66
eye
-0.66
cised
-0.65
requisite
-0.62
head
-0.62
oola
-0.62
lav
-0.61
ificantly
-0.61
ifa
-0.60
POSITIVE LOGITS
unsuccessfully
1.06
desperately
0.94
harder
0.82
vain
0.74
ioned
0.68
frantically
0.68
ichick
0.66
impe
0.66
ove
0.65
tried
0.65
Activations Density 0.045%