INDEX
Negative Logits
tow
-0.10
Tow
-0.08
Diplom
-0.08
lag
-0.08
thrift
-0.08
seen
-0.07
rena
-0.07
Dipl
-0.07
ade
-0.07
diplomacy
-0.07
POSITIVE LOGITS
thresholds
0.11
处分
0.11
milestones
0.10
divides
0.09
partitions
0.09
jun
0.09
segmentation
0.08
划
0.08
Threshold
0.08
Threshold
0.08
Activations Density 0.026%