INDEX
Negative Logits
naire
-0.68
ctuary
-0.67
BaseType
-0.66
lication
-0.65
Phar
-0.65
Rules
-0.63
leaflets
-0.63
Lans
-0.63
ativity
-0.62
rogram
-0.61
POSITIVE LOGITS
embed
0.91
dp
0.83
TY
0.79
share
0.78
gg
0.75
dn
0.74
gp
0.72
lav
0.71
deck
0.71
HT
0.71
Activations Density 2.962%