INDEX
    Explanations

    phrases that express beliefs about moral or religious authority

    New Auto-Interp
    Negative Logits
     arguably
    -0.64
    Ironically
    -0.57
     Ironically
    -0.56
    comprom
    -0.55
    -0.54
     redefine
    -0.54
    UNSIGNED
    -0.53
     Segen
    -0.52
     caveats
    -0.52
     defy
    -0.52
    POSITIVE LOGITS
    PreferredItem
    0.81
    AndEndTag
    0.73
     kollu
    0.65
    haustible
    0.64
    ItemBackground
    0.60
    Abitanti
    0.60
    TableBody
    0.60
    Ecotoxicity
    0.57
    ]';
    0.56
    InputBorder
    0.56
    Act Density 0.159%

    No Known Activations