INDEX
    Explanations

    references to various social and moral concepts

    New Auto-Interp
    Negative Logits
    touches
    -0.15
    ifa
    -0.14
    顺
    -0.14
    felt
    -0.13
    æĮ¯
    -0.13
    pga
    -0.13
    feit
    -0.13
    Facing
    -0.13
    astes
    -0.13
    .datas
    -0.13
    POSITIVE LOGITS
     dictate
    0.32
     dict
    0.31
    dict
    0.31
     dictates
    0.30
     Dict
    0.30
     intervened
    0.27
     dictated
    0.27
    Dict
    0.26
     interven
    0.26
     cons
    0.26
    Act Density 0.212%

    No Known Activations