INDEX
    Explanations

    references to feelings of dependence or addiction related to substances

    New Auto-Interp
    Negative Logits
    Билгалдахарш
    -1.02
     tartalomajánló
    -0.93
     Majefty
    -0.89
    Personensuche
    -0.88
    دانشنامهٔ
    -0.88
    Autoritní
    -0.87
    MigrationBuilder
    -0.87
     $_"
    -0.85
     myſelf
    -0.85
     ProtoMessage
    -0.85
    POSITIVE LOGITS
    [toxicity=0]
    1.02
    <
    1.00
     Q
    0.88
    Q
    0.88
    [
    0.86
      
    0.83
     <
    0.82
    0.79
     [
    0.78
    0.73
    Act Density 0.753%

    No Known Activations