INDEX
Explanations
mentions of scientific research being published in journals
occurrences of the word "published."
New Auto-Interp
Negative Logits
llan
-0.90
hart
-0.77
xa
-0.71
vette
-0.68
Architects
-0.68
Ĭ±
-0.67
awakened
-0.66
aturation
-0.65
uth
-0.65
ichael
-0.65
POSITIVE LOGITS
lishing
0.99
excerpts
0.93
lisher
0.92
lishes
0.75
Ô
0.70
behavi
0.70
itatively
0.69
exploits
0.69
URL
0.69
newsp
0.68
Activations Density 0.027%