INDEX
Explanations
phrases and expressions related to critique and moral reasoning
New Auto-Interp
Negative Logits
en
-0.18
Parl
-0.15
ë§ī
-0.15
äft
-0.15
zan
-0.14
isay
-0.14
vore
-0.14
iset
-0.13
onom
-0.13
iken
-0.13
POSITIVE LOGITS
าà¸ĺ
0.16
.accounts
0.15
-cookie
0.15
trieve
0.15
umed
0.15
PUR
0.14
PLIED
0.14
orth
0.14
ãĤ¤ãĥĪ
0.14
.builders
0.14
Activations Density 0.127%