INDEX
Explanations
references to personal identity and self-description
New Auto-Interp
Negative Logits
egin
-0.18
adopt
-0.16
ä¸ĢåĮº
-0.15
ÑĢиг
-0.15
ıt
-0.15
rying
-0.14
expand
-0.14
lix
-0.14
-0.14
starter
-0.13
POSITIVE LOGITS
frequent
0.20
frequ
0.19
regularly
0.19
moon
0.19
collect
0.18
moon
0.18
Collect
0.17
Collect
0.17
frequently
0.16
freq
0.16
Activations Density 0.415%