INDEX
Explanations
mentions of the name "Don."
New Auto-Interp
Negative Logits
View
-0.67
C
-0.66
In
-0.64
op
-0.62
<eos>
-0.61
de
-0.61
Before
-0.60
opo
-0.59
F
-0.59
The
-0.58
POSITIVE LOGITS
Isn
1.45
Isn
1.44
doesn
1.34
Shouldn
1.33
Wasn
1.31
wouldn
1.31
Aren
1.30
Doesn
1.30
weren
1.30
shouldn
1.29
Activations Density 0.156%