INDEX
    Explanations

    instances of the letter 't' or related tokens

    New Auto-Interp
    Negative Logits
    ĪĴ
    -0.93
     Dane
    -0.86
     Pigs
    -0.71
     Karma
    -0.71
     Desmond
    -0.70
     Decay
    -0.70
     Dull
    -0.69
     Wonderland
    -0.69
     tracts
    -0.66
     Corpus
    -0.66
    POSITIVE LOGITS
    youtube
    0.99
    facebook
    0.88
    twitter
    0.82
    etsy
    0.81
    gallery
    0.80
    ileaks
    0.79
    yp
    0.78
    cher
    0.77
    orah
    0.76
    github
    0.76
    Act Density 0.033%

    No Known Activations