INDEX
Negative Logits
spect
-0.09
innings
-0.08
Men's
-0.08
atek
-0.08
schein
-0.08
lighter
-0.08
pir
-0.08
statu
-0.07
announce
-0.07
PCS
-0.07
POSITIVE LOGITS
掉
0.09
rt
0.09
GPT
0.08
িদ্ধ
0.08
BR
0.08
કો
0.08
trustworthy
0.07
予
0.07
гай
0.07
GPT
0.07
Activations Density 0.001%