INDEX
    Explanations

    public infrastructure

    New Auto-Interp
    Negative Logits
    -0.07
    imator
    -0.06
    -0.06
     مکانی
    -0.06
     irony
    -0.06
     Broker
    -0.06
    FRINGEMENT
    -0.06
    oufl
    -0.06
    igor
    -0.06
    .tw
    -0.06
    POSITIVE LOGITS
    μί
    0.07
     โรง
    0.07
    =torch
    0.07
    Apple
    0.07
     правиль
    0.07
     перш
    0.06
     například
    0.06
     Alvarez
    0.06
     zie
    0.06
     abolished
    0.06
    Act Density 0.000%

    No Known Activations