INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fort
    -0.74
     flush
    -0.69
     malaria
    -0.65
     betray
    -0.64
     places
    -0.63
     construct
    -0.63
     levy
    -0.63
     mal
    -0.62
     increment
    -0.62
     moth
    -0.61
    POSITIVE LOGITS
    Video
    3.92
    video
    2.52
    VIDEO
    2.32
     Video
    2.27
     VIDEO
    1.99
     Videos
    1.89
     video
    1.85
    videos
    1.83
    Audio
    1.71
    ideos
    1.62
    Act Density 0.012%

    No Known Activations