INDEX
Negative Logits
dishonesty
0.41
也要
0.41
också
0.40
costruire
0.40
scared
0.40
unethical
0.39
اجمعين
0.39
splitContainer
0.39
beth
0.38
također
0.38
POSITIVE LOGITS
根據
0.53
根据
0.49
通过
0.47
благодаря
0.47
according
0.47
sayesinde
0.46
leveraging
0.46
utilizing
0.45
باستخدام
0.45
凭借
0.44
Activations Density 0.027%