INDEX
Explanations
words related to self, such as self-supporting, self-medicated, self-defense, self-esteem, and self-employed
references to self-related concepts
New Auto-Interp
Negative Logits
Flavoring
-0.84
GOODMAN
-0.82
Slay
-0.82
Amend
-0.77
Fever
-0.71
IUM
-0.70
Cosponsors
-0.68
Syndicate
-0.66
Sierra
-0.65
Orchestra
-0.65
POSITIVE LOGITS
destruct
1.13
destruct
1.07
self
1.02
upload
0.89
self
0.87
Self
0.84
diseng
0.84
lect
0.78
explanatory
0.78
aram
0.77
Activations Density 0.016%