INDEX
Negative Logits
hereditary
-0.07
contextual
-0.07
belakang
-0.07
dwar
-0.07
izabeth
-0.07
Post
-0.07
ftar
-0.07
输出
-0.07
Nom
-0.07
.wrapper
-0.07
POSITIVE LOGITS
rewarded
0.12
rewarding
0.11
removed
0.11
remover
0.10
reward
0.10
Removed
0.10
rewards
0.10
removal
0.10
triggered
0.10
remov
0.10
Activations Density 0.005%