INDEX
    Explanations

    phrases related to advocating for rights and standing up for beliefs

    New Auto-Interp
    Negative Logits
    ipment
    -0.55
    sterdam
    -0.55
    requisite
    -0.55
    lav
    -0.52
    velop
    -0.51
    ueller
    -0.51
    Course
    -0.51
    availability
    -0.51
     simulator
    -0.51
    artment
    -0.49
    POSITIVE LOGITS
    ĪĴ
    0.71
     whistleblowers
    0.69
     courageous
    0.65
     brave
    0.65
     injust
    0.63
     Against
    0.62
     principled
    0.61
     boldly
    0.60
     courage
    0.58
     humanity
    0.57
    Act Density 12.373%

    No Known Activations