INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Flavoring
    -0.80
    ctuary
    -0.78
     Evening
    -0.74
    ETH
    -0.70
    ews
    -0.68
     Seraph
    -0.68
    ILY
    -0.68
    abama
    -0.68
    vironment
    -0.67
     Atmosp
    -0.66
    POSITIVE LOGITS
    button
    0.97
    bell
    0.95
    holes
    0.90
    hole
    0.90
    oola
    0.90
     pus
    0.87
     button
    0.87
     buttons
    0.82
    header
    0.78
     clicked
    0.78
    Act Density 0.019%

    No Known Activations