INDEX
Negative Logits
ibur
-0.79
verbs
-0.73
iop
-0.72
kered
-0.69
ebook
-0.68
flix
-0.65
ibu
-0.65
Cola
-0.64
essee
-0.63
ella
-0.63
POSITIVE LOGITS
undone
0.80
from
0.78
ments
0.69
airport
0.69
doms
0.66
departure
0.63
ment
0.62
depart
0.62
untled
0.61
Wast
0.61
Activations Density 0.026%