INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Benz
    -0.07
     if
    -0.07
    Feb
    -0.07
     göl
    -0.07
     nearly
    -0.07
    yny
    -0.07
     Coy
    -0.07
     Cheney
    -0.06
     enthusiastic
    -0.06
    .Images
    -0.06
    POSITIVE LOGITS
     Master
    0.19
     master
    0.17
    Master
    0.16
     Masters
    0.15
    master
    0.15
     masters
    0.14
     MASTER
    0.14
    -master
    0.11
    MASTER
    0.11
    masters
    0.11
    Act Density 0.017%

    No Known Activations