INDEX
    Explanations

    phrases related to responsibility and accountability

    New Auto-Interp
    Negative Logits
    oma
    -0.16
     hâl
    -0.15
    wij
    -0.15
    ocal
    -0.15
    allo
    -0.14
    å§
    -0.14
    age
    -0.14
    ekim
    -0.14
     Pratt
    -0.14
    rippling
    -0.13
    POSITIVE LOGITS
    lish
    0.16
     Gül
    0.16
    allery
    0.15
    326
    0.14
    ÙĩÙĪØ±ÛĮ
    0.14
    Fetcher
    0.14
    ниÑĩ
    0.14
     punct
    0.14
    elsen
    0.14
    à¥įवत
    0.14
    Act Density 0.376%

    No Known Activations