INDEX
Explanations
proper nouns or names
specific proper nouns and entities related to locations, brands, or notable figures
New Auto-Interp
Negative Logits
]."
-0.78
..."
-0.75
().
-0.72
.�
-0.72
}.
-0.72
)."
-0.71
.).
-0.70
)).
-0.69
.<
-0.68
¶ħ
-0.66
POSITIVE LOGITS
Facts
0.64
Basics
0.61
revolves
0.59
hinges
0.57
boosters
0.57
renaissance
0.56
basics
0.56
Expand
0.54
consists
0.53
undrum
0.53
Activations Density 0.771%