INDEX
    Explanations

    mentions of specific sports teams

    New Auto-Interp
    Negative Logits
    ORMAL
    -0.16
    ADF
    -0.15
    jac
    -0.15
    šil
    -0.15
    cth
    -0.15
    ¯
    -0.14
    ltre
    -0.14
    .ly
    -0.14
    dra
    -0.14
    utherford
    -0.14
    POSITIVE LOGITS
    apo
    0.15
    šk
    0.15
    ož
    0.14
    Mi
    0.14
     Mi
    0.14
    #line
    0.14
    ylon
    0.13
    ugi
    0.13
     sông
    0.13
    WC
    0.13
    Act Density 0.037%

    No Known Activations