INDEX
    Explanations

    words related to strong opinions or emphasis

    New Auto-Interp
    Negative Logits
    ioned
    -0.77
    bourg
    -0.71
    ulton
    -0.71
    ulative
    -0.71
    isson
    -0.71
    jri
    -0.70
    engers
    -0.70
    iem
    -0.69
    tein
    -0.69
    NetMessage
    -0.69
    POSITIVE LOGITS
    ove
    0.71
    reme
    0.71
     bananas
    0.67
    urious
    0.67
     nuts
    0.67
     fucking
    0.66
     delighted
    0.65
     Vader
    0.65
    ĪĴ
    0.65
     adore
    0.64
    Act Density 0.025%

    No Known Activations