INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     circle
    -2.22
    circle
    -1.98
     circles
    -1.84
     Circle
    -1.77
    Circle
    -1.70
     círculo
    -1.60
     CIRCLE
    -1.59
    circles
    -1.54
     Circles
    -1.49
    CIRCLE
    -1.48
    POSITIVE LOGITS
     of
    0.77
    +#+#
    0.65
     as
    0.64
     for
    0.63
    ,
    0.63
     in
    0.62
     with
    0.57
     from
    0.57
     and
    0.56
    .
    0.54
    Act Density 0.958%

    No Known Activations