INDEX
Negative Logits
Hold
0.45
伤
0.45
Fail
0.41
勵
0.40
쨩
0.40
壓
0.40
stav
0.39
المش
0.39
傷
0.39
瘩
0.39
POSITIVE LOGITS
jargon
0.59
d
0.58
kenn
0.50
md
0.50
algebra
0.50
col
0.49
lau
0.48
sax
0.47
tabular
0.47
biography
0.47
Activations Density 0.001%