INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	sc
    -0.07
    wat
    -0.07
    _ud
    -0.07
    Sam
    -0.07
     feet
    -0.06
     Sheikh
    -0.06
    fuck
    -0.06
     compens
    -0.06
    Autom
    -0.06
    embourg
    -0.06
    POSITIVE LOGITS
     rise
    0.13
     Rise
    0.12
     rose
    0.11
     rising
    0.10
     Rising
    0.10
     rises
    0.09
     arise
    0.09
     Rose
    0.08
     arising
    0.08
     arises
    0.08
    Act Density 0.015%

    No Known Activations