INDEX
Explanations
concepts related to the pursuit and misconceptions of happiness and well-being
New Auto-Interp
Negative Logits
itten
-0.16
alled
-0.15
egrity
-0.15
alian
-0.15
GRID
-0.14
opol
-0.14
arked
-0.14
алом
-0.13
ihad
-0.13
zia
-0.13
POSITIVE LOGITS
以为
0.33
assume
0.33
assumes
0.33
assumption
0.31
assuming
0.30
assume
0.30
mistake
0.29
assumed
0.28
think
0.28
THINK
0.28
Activations Density 0.647%