INDEX
Explanations
references to specific individuals and things within casual or narrative conversations
New Auto-Interp
Negative Logits
ASY
-0.15
Performing
-0.15
ÑģÑĸм
-0.15
aska
-0.14
uese
-0.14
ASK
-0.14
classic
-0.14
.Model
-0.14
sak
-0.14
isco
-0.13
POSITIVE LOGITS
thing
0.27
stuff
0.24
thing
0.23
Stuff
0.20
Thing
0.20
guy
0.18
stuff
0.18
MUX
0.17
(thing
0.16
zenÃŃ
0.16
Activations Density 0.128%