INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     returns
    -0.07
     dis
    -0.07
     earns
    -0.07
    Bundle
    -0.07
     gas
    -0.07
    borg
    -0.06
    -0.06
    פר
    -0.06
     divor
    -0.06
    _window
    -0.06
    POSITIVE LOGITS
    0.08
    speaker
    0.07
    =len
    0.07
    ęż
    0.07
    listening
    0.07
    .prof
    0.06
     meant
    0.06
    	operator
    0.06
     gust
    0.06
     alguien
    0.06
    Act Density 0.006%

    No Known Activations