INDEX
Explanations
instances of the word "naive" and its variations
New Auto-Interp
Negative Logits
uing
-0.16
.opend
-0.15
olf
-0.15
esar
-0.15
eree
-0.15
ricks
-0.15
Ìģ
-0.14
sund
-0.14
auled
-0.14
.bd
-0.14
POSITIVE LOGITS
ive
0.19
IVE
0.18
yer
0.18
ivé
0.17
ve
0.17
Zot
0.16
ively
0.16
ivet
0.16
shire
0.16
creativecommons
0.16
Activations Density 0.001%