INDEX
    Explanations

    phrases related to the effects and impacts of various factors on individuals or groups

    New Auto-Interp
    Negative Logits
    dge
    -0.15
    .op
    -0.15
    tal
    -0.14
    æĪ¶
    -0.14
    hal
    -0.14
    asan
    -0.14
    íĺ¸
    -0.14
    tos
    -0.14
    adin
    -0.14
    man
    -0.14
    POSITIVE LOGITS
    buat
    0.16
    outcome
    0.16
    -Identifier
    0.16
     overall
    0.15
    ajar
    0.15
     outcome
    0.15
    ekl
    0.15
    ngör
    0.15
    ä¹İ
    0.15
    remium
    0.14
    Act Density 0.137%

    No Known Activations