INDEX
Explanations
names of people or organizations
proper nouns, specifically names of people and organizations
New Auto-Interp
Negative Logits
!),
-0.84
!",
-0.83
!).
-0.79
!'"
-0.74
!".
-0.72
.ãĢį
-0.71
anymore
-0.71
!:
-0.68
toget
-0.67
psychiat
-0.67
POSITIVE LOGITS
TP
0.95
RTX
0.90
IMAGES
0.87
<|endoftext|>
0.84
REUTERS
0.80
UNITED
0.79
Buy
0.78
Protesters
0.77
Dug
0.75
Generic
0.74
Activations Density 0.045%