INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bou
    -0.07
     Desire
    -0.06
     gaze
    -0.06
     xls
    -0.06
    「你
    -0.06
     borr
    -0.06
     imports
    -0.06
     bro
    -0.06
    who
    -0.06
    -0.06
    POSITIVE LOGITS
     medal
    0.14
     Medal
    0.12
     medals
    0.09
     stal
    0.07
     useDispatch
    0.07
     jedin
    0.07
    _checks
    0.07
    0.06
    als
    0.06
     didSelect
    0.06
    Act Density 0.003%

    No Known Activations