INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
ADOS
-0.15
alted
-0.14
Dispatch
-0.14
ndx
-0.14
guild
-0.14
stroy
-0.14
splice
-0.14
imple
-0.14
stands
-0.13
rir
-0.13
POSITIVE LOGITS
hom
0.16
代
0.16
advances
0.16
singled
0.15
łí
0.14
Lf
0.14
walked
0.14
-ajax
0.14
grounded
0.14
Hom
0.14
Activations Density 0.005%