INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Ortiz
    -0.07
    JOR
    -0.07
     виход
    -0.07
    407
    -0.07
     redd
    -0.06
     Voc
    -0.06
     Lowell
    -0.06
    	vector
    -0.06
     divid
    -0.06
    y
    -0.06
    POSITIVE LOGITS
     Machine
    0.11
    Machine
    0.11
     machine
    0.11
     машин
    0.09
     MACHINE
    0.09
    0.09
     machinery
    0.08
    machine
    0.08
     shame
    0.08
     machines
    0.08
    Act Density 0.025%

    No Known Activations