INDEX
    Explanations

    proper nouns and entities

    New Auto-Interp
    Negative Logits
    le
    0.57
     дело
    0.55
    liği
    0.53
     aqueles
    0.53
    lah
    0.51
     แต่
    0.51
     ότι
    0.51
    ЛА
    0.51
    𝐥
    0.50
     staan
    0.50
    POSITIVE LOGITS
     Correa
    0.52
     Strasbourg
    0.51
     Passengers
    0.49
     Conte
    0.49
     Corruption
    0.48
     Leg
    0.47
     Skyscanner
    0.47
     MPC
    0.47
     Syracuse
    0.46
     Titanic
    0.46
    Act Density 0.025%

    No Known Activations