INDEX
Explanations
themes related to relationships and identity
New Auto-Interp
Negative Logits
å¦ĥ
-0.14
好çļĦ
-0.13
ÑĢик
-0.13
ahir
-0.13
ëĭµ
-0.13
idable
-0.13
vero
-0.12
688
-0.12
opsy
-0.12
387
-0.12
POSITIVE LOGITS
thing
0.88
thing
0.65
Thing
0.65
stuff
0.60
Thing
0.54
cosa
0.45
coisa
0.42
stuff
0.42
things
0.42
Stuff
0.41
Activations Density 0.430%