INDEX
Negative Logits
preschool
-0.07
insults
-0.06
patents
-0.06
ownership
-0.06
viewport
-0.06
withheld
-0.06
rede
-0.06
citizenship
-0.06
新
-0.06
deviations
-0.06
POSITIVE LOGITS
drama
0.18
Drama
0.16
dramas
0.14
rama
0.11
Dram
0.09
Blade
0.07
/gpl
0.07
.Dense
0.07
ドラ
0.07
rez
0.07
Activations Density 0.004%