INDEX
Explanations
elements related to personal experiences and characteristics
New Auto-Interp
Negative Logits
.↵
-0.30
).↵
-0.25
ãĢĤ↵
-0.21
.↵↵
-0.20
>.↵
-0.20
}.↵
-0.20
".↵
-0.19
'.↵
-0.19
."↵
-0.19
().↵
-0.18
POSITIVE LOGITS
zwar
0.19
”ï¼Į
0.16
ìŀĪëĬĶëį°
0.16
ãĢij,
0.15
ãģĬãĤĬ
0.15
(),
0.15
_______,
0.14
AFX
0.14
ãĢĭï¼Į
0.14
!),
0.14
Activations Density 0.961%