INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    */(
    -0.85
     CBI
    -0.77
    hani
    -0.76
     Imran
    -0.71
     Rove
    -0.66
     Humanity
    -0.65
     Anonymous
    -0.64
     Ic
    -0.64
     Alexandria
    -0.62
     Learned
    -0.62
    POSITIVE LOGITS
    beans
    1.72
    bean
    1.71
    food
    1.03
    bowl
    0.92
    seed
    0.88
     beans
    0.88
    lda
    0.88
     sauce
    0.86
    char
    0.86
    wh
    0.84
    Act Density 0.002%

    No Known Activations