INDEX
Explanations
instances where an action is being performed or a comparison is made
phrases containing variations of the verb "do" and "did."
New Auto-Interp
Negative Logits
cipled
-0.69
urst
-0.63
Published
-0.60
asta
-0.60
iken
-0.59
SAR
-0.58
ussed
-0.58
Provided
-0.57
ouple
-0.57
bent
-0.57
POSITIVE LOGITS
pez
0.86
ettings
0.72
zed
0.72
aughters
0.68
zing
0.67
mosqu
0.66
jet
0.57
llor
0.57
Downloadha
0.57
whenever
0.56
Activations Density 0.056%