INDEX
    Explanations

    terms related to governmental or diplomatic entities and their actions

    New Auto-Interp
    Negative Logits
    ุà¹Ī
    -0.15
    تÙĬ
    -0.15
    ilet
    -0.15
    inst
    -0.14
    abd
    -0.14
    olit
    -0.14
    ij
    -0.14
    âĺħâĺħ
    -0.14
    ÑĭÑĪ
    -0.13
    aze
    -0.13
    POSITIVE LOGITS
    ATAB
    0.17
    POSITORY
    0.15
    rende
    0.15
    aptop
    0.15
    pig
    0.14
    ekk
    0.14
    ãģķãģ¾
    0.14
     FactoryBot
    0.14
    opsy
    0.13
    ifold
    0.13
    Act Density 0.025%

    No Known Activations