INDEX
    Explanations

    elements related to environmental and sustainability discussions

    New Auto-Interp
    Negative Logits
    LEncoder
    -0.80
    ambilan
    -0.79
     يتيمه
    -0.76
     Italijanski
    -0.75
    دانشنامهٔ
    -0.75
     alphabetical
    -0.73
    aarrggbb
    -0.73
     $_"
    -0.72
     downvotes
    -0.72
    UnsafeEnabled
    -0.72
    POSITIVE LOGITS
    [toxicity=0]
    0.86
    <bos>
    0.79
    0.79
    \
    0.63
    ↵↵
    0.59
    ↵↵↵
    0.59
    ">)</
    0.57
    </tr>
    0.52
    "
    0.52
    )
    0.52
    Act Density 0.050%

    No Known Activations