INDEX
Explanations
negative emotional experiences or actions
concepts related to trauma and emotional pain
New Auto-Interp
Negative Logits
interchange
-0.61
VL
-0.61
indicating
-0.58
uploads
-0.57
successors
-0.57
ifer
-0.57
indications
-0.56
suggest
-0.56
interestingly
-0.56
alleging
-0.56
POSITIVE LOGITS
oneself
1.04
Yourself
0.81
yourself
0.72
arenthood
0.69
thood
0.68
shitty
0.66
sweaty
0.64
______
0.63
solitude
0.61
drunk
0.61
Activations Density 0.898%