INDEX
Explanations
instances and mentions of a specific name, likely referring to a person
New Auto-Interp
Negative Logits
-
-0.54
-0.53
↵↵
-0.53
(
-0.52
,
-0.50
↵
-0.50
.
-0.49
1
-0.47
in
-0.47
China
-0.46
POSITIVE LOGITS
EconPapers
1.04
[@BOS@]
0.98
<unused28>
0.98
<unused41>
0.98
<unused14>
0.98
<unused3>
0.98
<unused16>
0.98
<unused17>
0.98
<unused8>
0.98
<pad>
0.98
Activations Density 0.185%