INDEX
    Explanations

    phrases related to justice or moral judgment

    New Auto-Interp
    Negative Logits
    ĻĤ
    -0.71
    ason
    -0.66
    SPA
    -0.65
     lapt
    -0.63
     satell
    -0.62
    iaries
    -0.61
     advoc
    -0.60
    worldly
    -0.59
    acas
    -0.58
    gard
    -0.58
    POSITIVE LOGITS
    /"
    1.15
     referring
    0.95
     ["
    0.84
     implying
    0.79
    [
    0.75
     referencing
    0.75
     refers
    0.72
    ([
    0.71
     meaning
    0.71
     ("
    0.70
    Act Density 0.655%

    No Known Activations