INDEX
    Explanations

    values and brain regions

    New Auto-Interp
    Negative Logits
    ate
    0.55
    0.52
    Modal
    0.51
    mills
    0.50
    ot
    0.49
    ham
    0.48
    namen
    0.47
    binary
    0.46
    chefs
    0.46
    mau
    0.46
    POSITIVE LOGITS
     ottenere
    0.55
    0.52
     coisa
    0.50
     defaul
    0.48
    0.48
     implement
    0.47
    0.47
     jit
    0.47
     enero
    0.46
     TLS
    0.46
    Act Density 0.001%

    No Known Activations