INDEX
Explanations
references to self-sufficiency and personal responsibility
New Auto-Interp
Negative Logits
Overall
-0.15
ubu
-0.15
edback
-0.15
Overall
-0.15
ingleton
-0.15
unny
-0.14
ady
-0.14
enger
-0.14
impunity
-0.13
uba
-0.13
POSITIVE LOGITS
myself
0.45
ourselves
0.41
yourself
0.38
à¹Ģà¸Ńà¸ĩ
0.36
selber
0.33
èĩªå·±
0.33
ÑģамоÑģÑĤоÑıÑĤелÑĮно
0.33
Yourself
0.32
himself
0.31
herself
0.30
Activations Density 0.335%