INDEX
Explanations
references to the word "original" and its variations
New Auto-Interp
Negative Logits
mere
-0.16
aggable
-0.16
recent
-0.16
arine
-0.16
untu
-0.15
cken
-0.14
ua
-0.14
agged
-0.14
Wagner
-0.14
aul
-0.14
POSITIVE LOGITS
/original
0.26
ity
0.22
arily
0.21
mente
0.20
-fashioned
0.18
isé
0.17
arity
0.17
ë¶ĢíĦ°
0.17
-language
0.17
ised
0.16
Activations Density 0.030%