INDEX
    Explanations

    expressions that denote close, emotionally intimate relationships or strong personal bonds between people.

    graphic or explicit descriptions of cannibalism (eating human flesh) or similarly gruesome content.

    New Auto-Interp
    Negative Logits
    0.33
     triumphant
    0.30
     murderous
    0.30
     elegan
    0.29
     auster
    0.29
    شاه
    0.28
     katalog
    0.28
     produkt
    0.28
     auspices
    0.28
     fondamentali
    0.27
    POSITIVE LOGITS
     如果
    0.39
     Pokud
    0.34
    Nếu
    0.33
    Pokud
    0.33
     μπορεί
    0.33
    if
    0.32
     যদি
    0.32
     moze
    0.32
     можуть
    0.32
    যদি
    0.31
    Act Density 0.098%

    No Known Activations