INDEX
    Explanations

    hate crimes and vandalism

    New Auto-Interp
    Negative Logits
     Indust
    -0.07
    עכשיו
    -0.07
    عي
    -0.07
    中外
    -0.06
    _bt
    -0.06
    _ALREADY
    -0.06
    Adv
    -0.06
    .point
    -0.06
    .market
    -0.06
     trends
    -0.06
    POSITIVE LOGITS
     spender
    0.08
     giọng
    0.07
    0.07
    🏏
    0.07
    槿
    0.07
     umożliwia
    0.07
    士兵
    0.07
    =forms
    0.07
    超标
    0.07
    rij
    0.07
    Act Density 0.054%

    No Known Activations