INDEX
    Explanations

    words related to breaking or failure

    references to the term "bust" and its variations in different contexts

    New Auto-Interp
    Negative Logits
     vomit
    -0.67
    WAYS
    -0.65
     Rouge
    -0.64
     Cruel
    -0.62
    ised
    -0.62
     Pradesh
    -0.59
    mble
    -0.59
    dfx
    -0.58
    ndra
    -0.58
     vomiting
    -0.57
    POSITIVE LOGITS
    le
    0.95
    lar
    0.92
    buster
    0.90
    aign
    0.89
    enegger
    0.89
    neck
    0.89
    y
    0.88
    cies
    0.88
    ards
    0.87
    les
    0.87
    Act Density 0.039%

    No Known Activations