INDEX
    Explanations

    the word "mod" with varying degrees of strength

    the presence of the word "mod" and related terms associated with moderation or modification

    New Auto-Interp
    Negative Logits
    vana
    -0.75
    ibaba
    -0.71
    jriwal
    -0.65
    ï¸
    -0.64
    ©¶æ
    -0.63
    ISA
    -0.61
     Flavoring
    -0.61
    Enlarge
    -0.60
    mble
    -0.60
    Adapt
    -0.59
    POSITIVE LOGITS
    erella
    0.75
    opol
    0.72
    ooth
    0.65
    ighters
    0.65
    ail
    0.61
    oin
    0.61
    ruck
    0.60
    asty
    0.60
    nant
    0.59
    ude
    0.58
    Act Density 0.158%

    No Known Activations