INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ageing
    -0.76
     defic
    -0.74
     aging
    -0.74
     rall
    -0.71
     conclud
    -0.68
     grounding
    -0.68
     winters
    -0.65
     inequ
    -0.61
     challeng
    -0.61
     botched
    -0.60
    POSITIVE LOGITS
    co
    1.41
    twitter
    0.92
    redd
    0.91
    youtube
    0.90
    shirts
    0.89
    wikipedia
    0.87
    github
    0.86
    assetsadobe
    0.86
    coon
    0.86
    facebook
    0.84
    Act Density 0.011%

    No Known Activations