INDEX
Negative Logits
was
-0.10
were
-0.09
could
-0.09
would
-0.08
WAS
-0.07
could
-0.07
had
-0.07
hadn
-0.07
Was
-0.07
isn
-0.07
POSITIVE LOGITS
acomment
0.07
/scripts
0.07
strt
0.07
нг
0.07
.di
0.06
writing
0.06
ано
0.06
.det
0.06
avoided
0.06
temperament
0.06
Activations Density 0.856%