INDEX
Explanations
references to articles
instances of the word "article."
New Auto-Interp
Negative Logits
selves
-0.79
Ĭ±
-0.71
Governors
-0.69
Lens
-0.65
Tokens
-0.64
Calling
-0.63
Magikarp
-0.62
Sleeping
-0.61
Cases
-0.59
Genie
-0.59
POSITIVE LOGITS
contains
1.23
summarizes
1.16
assumes
1.14
represents
1.13
illustrates
1.02
provides
1.00
is
0.99
reflects
0.98
includes
0.97
relies
0.97
Activations Density 0.114%