INDEX
Explanations
violent or aggressive actions or intentions
actions related to punishment and control
New Auto-Interp
Negative Logits
ItemTracker
-0.57
Downloadha
-0.56
ãĤ´ãĥ³
-0.55
dating
-0.53
dotted
-0.53
nih
-0.52
Nos
-0.51
fashioned
-0.50
arij
-0.50
ebook
-0.49
POSITIVE LOGITS
uate
0.80
ISE
0.63
enance
0.62
ulate
0.59
them
0.58
itate
0.58
him
0.58
igate
0.58
oneself
0.55
RELEASE
0.55
Activations Density 0.660%