INDEX
Explanations
compound words and phrases that describe self-management and self-destructive behaviors
New Auto-Interp
Negative Logits
utin
-0.15
amedi
-0.15
ank
-0.14
undry
-0.14
Van
-0.14
Jensen
-0.13
argent
-0.13
Mis
-0.13
asper
-0.13
oya
-0.13
POSITIVE LOGITS
/self
0.46
Self
0.35
(Self
0.32
self
0.30
Self
0.30
SELF
0.28
-self
0.27
self
0.26
,self
0.26
SELF
0.24
Activations Density 0.036%