INDEX
Negative Logits
Clearly
0.36
svom
0.35
educating
0.35
ructose
0.35
both
0.34
rodu
0.34
╝
0.34
diketahui
0.33
esperaba
0.33
enties
0.33
POSITIVE LOGITS
key
0.75
crux
0.70
reason
0.69
difficulty
0.66
关键
0.66
trick
0.64
tricky
0.62
usefulness
0.62
misconception
0.61
핵심
0.60
Activations Density 0.013%