INDEX
    Explanations

    statements related to societal issues, morality, politics, and ethical behavior

    New Auto-Interp
    Negative Logits
     solidar
    -0.82
     demen
    -0.81
     quí
    -0.81
     promi
    -0.80
     notor
    -0.80
     umo
    -0.80
     melat
    -0.79
     dises
    -0.79
     robus
    -0.76
     albic
    -0.76
    POSITIVE LOGITS
     Therefore
    0.73
     therefore
    0.71
    Therefore
    0.65
    therefore
    0.64
     Whether
    0.61
     Hence
    0.59
     whether
    0.58
     hence
    0.55
     unless
    0.52
     But
    0.52
    Act Density 0.662%

    No Known Activations