INDEX
    Explanations

    phrases expressing strong opinions or beliefs

    New Auto-Interp
    Negative Logits
    oled
    -0.77
    blem
    -0.76
    omsky
    -0.72
    oling
    -0.71
    =~=~
    -0.69
    ositories
    -0.68
    ernels
    -0.67
    oÄŁan
    -0.67
    lia
    -0.63
    Newsletter
    -0.62
    POSITIVE LOGITS
     goodbye
    1.14
     bye
    0.98
     aloud
    0.84
    lihood
    0.75
     amen
    0.69
     publicly
    0.69
     hello
    0.67
     loudly
    0.62
    YN
    0.62
     sorry
    0.61
    Act Density 0.065%

    No Known Activations