INDEX
Explanations
instances of attempted actions or interventions
instances of the word "tried"
New Auto-Interp
Negative Logits
scribe
-0.77
head
-0.76
inant
-0.67
cedented
-0.66
Quality
-0.66
ificantly
-0.65
Production
-0.64
thus
-0.63
hi
-0.62
requisite
-0.62
POSITIVE LOGITS
unsuccessfully
1.36
valiant
0.81
desperately
0.81
harder
0.78
vain
0.72
nces
0.72
repeatedly
0.71
tried
0.69
Pett
0.69
nesday
0.69
Activations Density 0.049%