INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lectic
    -0.78
    icut
    -0.78
    anwhile
    -0.75
    ij士
    -0.73
    uries
    -0.70
    selves
    -0.70
    arters
    -0.70
     Gutenberg
    -0.69
    yrim
    -0.69
    ubuntu
    -0.69
    POSITIVE LOGITS
     cub
    0.94
    bear
    0.90
     hugs
    0.85
     claws
    0.84
     Gry
    0.84
     hug
    0.83
    beit
    0.83
    xual
    0.82
     Grizz
    0.82
     paws
    0.80
    Act Density 0.026%

    No Known Activations