INDEX
Explanations
references to economic exploitation and slavery
New Auto-Interp
Negative Logits
-0.76
بيها
-0.63
numerus
-0.62
AddHtmlAttribute
-0.60
illées
-0.59
itieren
-0.55
hamilan
-0.55
typhoid
-0.54
miert
-0.54
saliva
-0.54
POSITIVE LOGITS
surplus
0.76
leftover
0.63
discarded
0.61
queryInterface
0.56
repur
0.55
Surplus
0.53
rejected
0.52
掉的
0.51
Lef
0.50
unwanted
0.50
Activations Density 0.285%