INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Drop
    -0.07
    %;"
    -0.07
    lettes
    -0.07
    pper
    -0.06
    bble
    -0.06
    ancybox
    -0.06
     drop
    -0.06
    %</
    -0.06
    ,*
    -0.06
     Pron
    -0.06
    POSITIVE LOGITS
    aim
    0.12
    ai
    0.12
    ao
    0.11
    ais
    0.10
    Ao
    0.10
    au
    0.10
    aal
    0.10
    AI
    0.09
    aed
    0.09
    ain
    0.09
    Act Density 0.457%

    No Known Activations