INDEX
Explanations
phrases and sentences that provide factual information or statistics
New Auto-Interp
Negative Logits
cert
-0.07
erk
-0.07
kin
-0.07
olson
-0.06
dess
-0.06
cial
-0.06
dana
-0.06
elephant
-0.06
Ñĩай
-0.06
ez
-0.06
POSITIVE LOGITS
about
0.08
heet
0.08
heets
0.07
facts
0.07
654
0.07
оÑħ
0.07
316
0.07
_keeper
0.07
facts
0.07
amed
0.06
Activations Density 0.004%