INDEX
    Explanations

    friendly greetings and positive emojis

    New Auto-Interp
    Negative Logits
    )....
    1.00
    ...'
    0.98
    ...).
    0.96
    ...."
    0.95
    ...',
    0.94
    ']."
    0.93
    )...
    0.89
    ..."
    0.89
    '...
    0.88
     .......
    0.88
    POSITIVE LOGITS
     😊
    1.53
     :)
    1.43
     🙂
    1.40
    😊
    1.28
     😀
    1.18
     😄
    1.14
     🤗
    1.10
     🥰
    1.08
     😁
    1.06
    ☺️
    1.06
    Act Density 0.930%

    No Known Activations