INDEX
    Explanations

    references to political or social criticism regarding specific groups or situations

    New Auto-Interp
    Negative Logits
     myſelf
    -0.94
    MigrationBuilder
    -0.90
     itſelf
    -0.89
    MLLoader
    -0.85
     disambiguazione
    -0.81
     houſe
    -0.76
     ModelExpression
    -0.76
     iſt
    -0.76
    InjectAttribute
    -0.75
     pleaſure
    -0.74
    POSITIVE LOGITS
     idiotic
    0.60
    https
    0.59
     propaganda
    0.58
     fascist
    0.57
     moron
    0.57
     incompetent
    0.56
    😡
    0.56
     morons
    0.55
     stupidity
    0.55
     https
    0.55
    Act Density 1.458%

    No Known Activations