INDEX
Explanations
themes related to personal identity and self-expression
New Auto-Interp
Negative Logits
eya
-0.15
ÏĢε
-0.14
blind
-0.14
abase
-0.13
394
-0.13
urahan
-0.13
Batch
-0.13
onas
-0.13
urity
-0.13
Gross
-0.13
POSITIVE LOGITS
selves
0.33
version
0.29
-version
0.27
versions
0.27
Version
0.26
Versions
0.26
.version
0.26
oneself
0.25
versions
0.25
person
0.25
Activations Density 0.215%