INDEX
Explanations
terms related to punishment and its variations
New Auto-Interp
Negative Logits
¥¤
-0.15
گرÛĮ
-0.15
editary
-0.15
KIT
-0.15
hrs
-0.15
esk
-0.14
ullah
-0.14
füh
-0.14
oga
-0.14
outes
-0.14
POSITIVE LOGITS
jabi
0.24
185
0.22
pun
0.21
ning
0.19
ks
0.18
ishments
0.17
pun
0.16
erli
0.16
sters
0.15
ishment
0.15
Activations Density 0.011%