INDEX
Explanations
proper nouns of TV shows and universities
punctuation marks and specific character formatting in written content
New Auto-Interp
Negative Logits
bro
-0.75
onian
-0.74
whis
-0.73
itaire
-0.71
eb
-0.71
cube
-0.70
kefeller
-0.70
woods
-0.70
runners
-0.69
cies
-0.68
POSITIVE LOGITS
âĢ
1.90
«
1.33
**
1.31
*
1.28
¶
1.24
âĶ
1.22
â
1.19
***
1.19
ãĢĮ
1.18
®
1.16
Activations Density 0.231%