INDEX
Explanations
mentions of significant individuals and their professional or personal relationships
New Auto-Interp
Negative Logits
!).
-0.34
).
-0.31
}.
-0.31
}.
-0.30
?).
-0.29
!.
-0.28
!!.
-0.28
`.
-0.28
].
-0.28
”.
-0.28
POSITIVE LOGITS
.,↵
0.36
,↵
0.35
..↵
0.32
↵
0.30
.'↵
0.29
.*↵
0.28
/↵
0.27
'↵
0.26
.↵
0.26
ï¼Į↵
0.25
Activations Density 0.273%