INDEX
Explanations
instances of the word "original" or related terms
New Auto-Interp
Negative Logits
mere
-0.15
untu
-0.15
&W
-0.15
mere
-0.15
cken
-0.15
Wiley
-0.14
hle
-0.14
ranks
-0.14
ow
-0.14
arta
-0.14
POSITIVE LOGITS
/original
0.23
ity
0.19
mente
0.18
undos
0.17
-fashioned
0.17
ities
0.17
ised
0.16
.Formatter
0.15
arily
0.15
-original
0.15
Activations Density 0.027%