INDEX
    Explanations

    occurrences of specific nouns and phrases related to policies and classifications

    New Auto-Interp
    Negative Logits
    ừng
    -0.15
    panies
    -0.15
    imates
    -0.15
    TED
    -0.14
    ields
    -0.14
    erer
    -0.14
    cz
    -0.14
    ept
    -0.14
    kaar
    -0.14
     force
    -0.13
    POSITIVE LOGITS
     Gle
    0.16
    kit
    0.15
    itsu
    0.15
    anmar
    0.14
    venir
    0.14
    Ìī
    0.14
    ian
    0.13
    ÙĦعاب
    0.13
    ience
    0.13
     Furn
    0.13
    Act Density 0.002%

    No Known Activations