INDEX
Negative Logits
nity
0.39
Remark
0.38
verr
0.38
Flare
0.38
ism
0.37
性に
0.37
flags
0.35
remark
0.35
Tech
0.35
curl
0.35
POSITIVE LOGITS
*}{0.59
*}
0.57
*}$
0.50
*}
0.49
unlabeled
0.49
''){0.48
summarize
0.47
summarise
0.46
*}\
0.46
phrasing
0.45
Activations Density 0.001%