INDEX
Explanations
concepts and phrases related to beliefs and moral values
New Auto-Interp
Negative Logits
erson
-0.15
ye
-0.14
erc
-0.14
yna
-0.14
ez
-0.14
Wire
-0.13
ive
-0.13
hipster
-0.13
-Methods
-0.13
clip
-0.13
POSITIVE LOGITS
humans
0.17
Every
0.16
ìĤ¬ëŀĮìĿĢ
0.16
every
0.16
knowledge
0.15
success
0.15
life
0.15
bove
0.15
ooks
0.14
earning
0.14
Activations Density 0.433%