INDEX
Explanations
elements of familial relationships and emotional connections in narratives
New Auto-Interp
Negative Logits
``
-0.20
",
-0.18
âĢIJ
-0.18
,’’
-0.17
âĢIJ
-0.16
"].(
-0.16
...↵
-0.16
,,
-0.16
``
-0.15
',
-0.15
POSITIVE LOGITS
_↵↵
0.60
_↵
0.56
._↵↵
0.55
._↵
0.52
_
0.51
_.
0.49
._
0.44
:_
0.44
_,
0.43
,_
0.43
Activations Density 0.088%