INDEX
Explanations
mentions of articles and publications
instances of the word "article" and its variations
New Auto-Interp
Negative Logits
Tokens
-0.73
Genie
-0.73
ELF
-0.72
Ĭ±
-0.68
Mouth
-0.65
_>
-0.65
sed
-0.65
ItemImage
-0.64
Governors
-0.63
AFB
-0.62
POSITIVE LOGITS
assumes
0.98
summarizes
0.96
contains
0.90
represents
0.89
illustrates
0.84
teaches
0.84
requires
0.83
cannot
0.82
describes
0.82
relies
0.81
Activations Density 0.125%