INDEX
    Explanations

    references to influential figures and their ideas or theories

    New Auto-Interp
    Negative Logits
     Completed
    -0.17
     Done
    -0.15
    Ñĥз
    -0.14
     chaired
    -0.14
     caused
    -0.14
    áli
    -0.14
    done
    -0.14
    llen
    -0.14
    iei
    -0.13
     performed
    -0.13
    POSITIVE LOGITS
     advanced
    0.54
    advanced
    0.45
     Advanced
    0.38
    Advanced
    0.36
    _advanced
    0.30
     prop
    0.30
     prom
    0.28
     avanz
    0.28
     esp
    0.27
     put
    0.25
    Act Density 0.322%

    No Known Activations