INDEX
Explanations
negative expressions about respect and opinions in social contexts
New Auto-Interp
Negative Logits
mektedir
-0.80
maktadır
-0.64
rsiniz
-0.57
<eos>
-0.52
almendras
-0.47
venons
-0.47
awaiter
-0.47
soggetto
-0.46
cination
-0.45
美味しかったです
-0.45
POSITIVE LOGITS
Савезне
0.94
InputBorder
0.84
itſelf
0.82
0.81
новниш
0.80
ſelf
0.78
:]:
0.78
faſt
0.77
)':
0.76
transfieras
0.75
Activations Density 0.329%