INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .movies
    -0.07
    .org
    -0.07
     mentre
    -0.07
    μένη
    -0.06
     zel
    -0.06
     phối
    -0.06
    ha
    -0.06
     swo
    -0.06
    ẳng
    -0.06
    xico
    -0.06
    POSITIVE LOGITS
     unnamed
    0.09
    Unnamed
    0.07
    ические
    0.07
    ?:
    0.07
    ointments
    0.07
    ступ
    0.06
    ofi
    0.06
     fık
    0.06
     Race
    0.06
     ABI
    0.06
    Act Density 0.009%

    No Known Activations