INDEX
Explanations
challenging or criticizing others
New Auto-Interp
Negative Logits
മം
0.39
அதிச
0.39
inscri
0.38
ezingu
0.38
bahawa
0.38
whak
0.38
Tämä
0.37
Labrenzia
0.36
Confira
0.36
vielf
0.36
POSITIVE LOGITS
someone
0.51
suppliers
0.51
him
0.49
相手
0.45
politicians
0.45
对方
0.44
অন্যদের
0.44
誰
0.43
eseorang
0.42
iemand
0.42
Activations Density 0.150%