INDEX
    Explanations

    phrases related to moral and ethical considerations

    New Auto-Interp
    Negative Logits
    gdala
    -0.70
     Zip
    -0.62
     Rost
    -0.61
     Mamm
    -0.60
     Democr
    -0.60
    izon
    -0.60
    wave
    -0.59
     Lars
    -0.58
    illusion
    -0.55
     Guilty
    -0.54
    POSITIVE LOGITS
     attention
    0.96
    lessly
    0.89
     scrutiny
    0.82
    ENTION
    0.78
    FINE
    0.77
     tweaking
    0.76
     updating
    0.74
     Attention
    0.73
     correction
    0.72
     repairs
    0.71
    Act Density 0.100%

    No Known Activations