INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .bulk
    -0.06
    (public
    -0.06
    "L
    -0.06
     Trav
    -0.06
    Paul
    -0.06
     racially
    -0.06
     tham
    -0.06
    "M
    -0.06
       	
    -0.06
    Mari
    -0.06
    POSITIVE LOGITS
    agne
    0.07
    aired
    0.07
    efd
    0.07
    0.07
    dorf
    0.06
    folder
    0.06
     kost
    0.06
     Authors
    0.06
    -Jun
    0.06
    oriasis
    0.06
    Act Density 0.006%

    No Known Activations