INDEX
    Explanations

    sentences with positive affirmations or praises

    expressions of pride and recognition

    New Auto-Interp
    Negative Logits
     Rothschild
    -0.65
    miah
    -0.63
    bats
    -0.62
     contrace
    -0.62
    ombie
    -0.60
     Stras
    -0.59
    ibrary
    -0.58
     monarchy
    -0.58
     Telesc
    -0.58
     mathemat
    -0.58
    POSITIVE LOGITS
     Deliver
    1.51
    ices
    0.73
    ounters
    0.72
     Corpus
    0.69
    ãĥ¼ãĥĨ
    0.67
     Advertisement
    0.66
    ãĥ¼ãĥĨãĤ£
    0.64
    omet
    0.64
     to
    0.64
    ibaba
    0.62
    Act Density 0.000%

    No Known Activations