INDEX
Explanations
words related to published works or academic studies
instances of the word "published."
New Auto-Interp
Negative Logits
aho
-0.80
nea
-0.80
xa
-0.77
llan
-0.76
hart
-0.76
atra
-0.75
avery
-0.73
ichael
-0.72
ascar
-0.72
uppet
-0.72
POSITIVE LOGITS
lishing
1.23
lisher
1.13
published
1.02
published
1.00
publication
0.98
publishes
0.96
behavi
0.94
lishes
0.93
RELE
0.92
DragonMagazine
0.91
Activations Density 0.022%