INDEX
Negative Logits
dict
-0.08
cases
-0.07
olicies
-0.07
Shipment
-0.07
terlihat
-0.07
.look
-0.07
домов
-0.07
Ship
-0.07
.respond
-0.07
-0.07
POSITIVE LOGITS
harmful
0.11
radicals
0.10
damaging
0.10
oxidative
0.10
pathogens
0.10
生成
0.09
wreak
0.09
harming
0.09
攻击
0.09
químicos
0.09
Activations Density 0.004%