INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Beef
    -0.07
    advert
    -0.06
     Whites
    -0.06
    _menus
    -0.06
    Requested
    -0.06
     persec
    -0.06
    _serv
    -0.06
     сог
    -0.06
    しても
    -0.06
    Mini
    -0.06
    POSITIVE LOGITS
     describe
    0.07
     break
    0.07
     vintage
    0.07
    ],
    0.07
     recruited
    0.07
     fired
    0.06
    ]\
    0.06
     гли
    0.06
     velkou
    0.06
    рит
    0.06
    Act Density 0.000%

    No Known Activations