INDEX
Explanations
Twitter usernames
occurrences of the end-of-text token
New Auto-Interp
Negative Logits
shortage
-0.75
fulfilling
-0.73
bite
-0.72
venom
-0.72
tense
-0.72
claws
-0.72
loan
-0.72
bites
-0.71
captcha
-0.71
pacing
-0.71
POSITIVE LOGITS
Writ
1.41
_
1.36
Official
1.29
Stud
1.23
Reports
1.23
Ide
1.22
News
1.19
Jew
1.18
WithNo
1.17
Games
1.17
Activations Density 0.163%