INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ington
    -0.93
    heimer
    -0.84
    zig
    -0.81
    eties
    -0.79
    ety
    -0.78
    kus
    -0.76
    ebook
    -0.76
    lib
    -0.74
    izable
    -0.74
    ias
    -0.74
    POSITIVE LOGITS
     PHOTO
    1.15
     IMAGES
    1.10
    FILE
    0.95
    OTOS
    0.91
     VIDEOS
    0.88
    ACT
    0.83
    OUT
    0.81
    BIL
    0.79
    HAEL
    0.78
     EVENT
    0.77
    Act Density 0.025%

    No Known Activations