INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    glers
    -0.83
     hail
    -0.68
     cones
    -0.62
     bother
    -0.58
    fires
    -0.58
     fork
    -0.58
     cigars
    -0.57
     spiders
    -0.57
     Typhoon
    -0.57
     snakes
    -0.56
    POSITIVE LOGITS
    illary
    1.00
    wered
    1.00
    heim
    0.94
    imity
    0.93
    otropic
    0.88
    obic
    0.83
    ibo
    0.82
    orage
    0.81
    andre
    0.79
    estic
    0.76
    Act Density 0.041%

    No Known Activations