INDEX
Explanations
phrases related to attempting or striving
instances of the word "trying."
New Auto-Interp
Negative Logits
dylib
-0.76
oola
-0.68
anon
-0.64
lav
-0.61
cised
-0.60
ros
-0.60
thus
-0.59
ifa
-0.59
spread
-0.58
friends
-0.58
POSITIVE LOGITS
unsuccessfully
1.07
desperately
0.98
harder
0.90
ioned
0.73
reprene
0.72
vain
0.71
valiant
0.71
frantically
0.70
ggle
0.66
ison
0.66
Activations Density 0.050%