INDEX
Explanations
instances of physical restraint or aggression
New Auto-Interp
Negative Logits
wts
-0.56
виправивши
-0.56
Single
-0.55
single
-0.54
уза
-0.54
kasarigan
-0.51
Single
-0.51
窟
-0.50
ervazione
-0.50
single
-0.49
POSITIVE LOGITS
ANTLR
0.74
hugs
0.74
cuddle
0.73
RectangleBorder
0.73
kisses
0.72
wrestling
0.72
wrestle
0.72
restling
0.69
hugging
0.69
abraço
0.68
Activations Density 0.260%