INDEX
Explanations
instances of the word "does"
New Auto-Interp
Negative Logits
Leal
-0.69
ing
-0.67
tmpl
-0.63
artement
-0.62
Gull
-0.62
<<<<<<<<<<<<<<
-0.60
McCartney
-0.60
merid
-0.59
▾
-0.59
اعدة
-0.59
POSITIVE LOGITS
Does
1.91
does
1.86
Does
1.85
does
1.78
DOES
1.72
DOES
1.61
DID
1.19
do
1.15
doe
1.13
Did
1.12
Activations Density 0.127%