INDEX
Explanations
phrases indicating accusation or blame
New Auto-Interp
Negative Logits
verrez
-0.42
korun
-0.40
issue
-0.39
Ainda
-0.38
decker
-0.37
the
-0.36
interno
-0.36
issue
-0.35
figliu
-0.34
taban
-0.34
POSITIVE LOGITS
0.69
andExpect
0.66
autorytatywna
0.66
utafitiHapana
0.65
guilty
0.65
мәкал
0.63
propOrder
0.62
Guilty
0.62
(&:
0.61
guilty
0.60
Activations Density 0.020%