INDEX
Explanations
proper nouns
instances of empty or irrelevant content
New Auto-Interp
Negative Logits
ÏĢ
-0.80
ceive
-0.74
ptr
-0.71
200000
-0.71
assed
-0.68
¯
-0.68
thood
-0.68
Ïī
-0.68
leen
-0.68
!.
-0.67
POSITIVE LOGITS
resa
1.45
odore
1.37
oret
1.33
latter
1.15
latest
1.10
ories
1.07
biggest
0.98
irony
0.95
idea
0.94
earliest
0.92
Activations Density 0.399%