INDEX
Negative Logits
_FAMILY
-0.29
åħ¥åľº
-0.27
usan
-0.27
asn
-0.26
horn
-0.26
filled
-0.26
ÑĦоÑĢ
-0.25
family
-0.24
depleted
-0.24
Family
-0.24
POSITIVE LOGITS
instruction
0.31
instruction
0.29
:red
0.28
amba
0.27
reste
0.27
Instruction
0.26
strugg
0.26
åı·
0.26
-alist
0.26
å¤ĩ
0.25
Activations Density 0.058%