INDEX
Negative Logits
l
-0.75
发表于
-0.75
Le
-0.73
{\-0.71
bil
-0.71
la
-0.70
Re
-0.70
Bil
-0.69
d
-0.69
Gru
-0.69
POSITIVE LOGITS
Intake
1.45
Intake
1.34
intake
1.30
intake
1.27
intakes
1.15
Theſe
1.01
ñores
0.97
Efq
0.95
myſelf
0.94
Jefus
0.93
Activations Density 0.004%