INDEX
    Explanations

    adjectives expressing lack of harm or danger

    terms related to the concepts of harmlessness and benignity

    New Auto-Interp
    Negative Logits
    KER
    -0.70
    GPU
    -0.69
    lining
    -0.66
    funding
    -0.64
    Cla
    -0.64
    Pain
    -0.62
    pain
    -0.62
    ingo
    -0.61
    lin
    -0.61
    Connell
    -0.60
    POSITIVE LOGITS
     harmless
    1.04
     innocuous
    0.92
     benign
    0.85
    »Ĵ
    0.81
     minded
    0.81
    alty
    0.78
     bystand
    0.77
    ality
    0.76
    mate
    0.71
    ishable
    0.69
    Act Density 0.016%

    No Known Activations