INDEX
Explanations
instances of the word "same."
New Auto-Interp
Negative Logits
own
-0.15
rious
-0.15
ầm
-0.14
untas
-0.14
cas
-0.14
more
-0.13
amburger
-0.13
propia
-0.13
iesta
-0.13
osemite
-0.13
POSITIVE LOGITS
-sex
0.34
exact
0.26
thing
0.26
kind
0.22
ãģı
0.21
sort
0.21
exact
0.21
Exact
0.18
Exact
0.18
-origin
0.18
Activations Density 0.052%