INDEX
Explanations
references to perception, interpretation, and understanding of concepts or situations
New Auto-Interp
Negative Logits
Pants
-0.16
ĶåĽŀ
-0.15
eat
-0.14
lon
-0.14
izu
-0.14
odyn
-0.14
aver
-0.13
ron
-0.13
ylon
-0.13
Timeout
-0.13
POSITIVE LOGITS
è¿Ļæĺ¯
0.19
it
0.18
NullOr
0.18
phas
0.18
hound
0.17
Äijây
0.17
sebagai
0.16
herself
0.15
isas
0.15
himself
0.15
Activations Density 0.150%