INDEX
Explanations
possessive or controlling actions
New Auto-Interp
Negative Logits
defiant
0.39
defiance
0.37
داخل
0.36
subversive
0.36
പോലുള്ള
0.35
کمتر
0.34
暐
0.34
தை
0.33
dagen
0.33
внутри
0.33
POSITIVE LOGITS
threatening
0.36
threatened
0.34
encro
0.34
merciless
0.33
reactant
0.33
threatens
0.33
accusing
0.33
bully
0.32
wanting
0.30
reciproc
0.30
Activations Density 0.159%