INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Pharmaceuticals
    -0.07
    .comment
    -0.07
    :::::::
    -0.07
     götür
    -0.06
     Hamilton
    -0.06
     london
    -0.06
    -0.06
     haunted
    -0.06
    -associated
    -0.06
    illion
    -0.06
    POSITIVE LOGITS
     Steven
    0.07
     verbose
    0.07
    poster
    0.06
    WARN
    0.06
     Greg
    0.06
     zz
    0.06
    norm
    0.06
    ション
    0.06
     Tutor
    0.06
     ix
    0.06
    Act Density 0.211%

    No Known Activations