INDEX
    Explanations

    humor and playful language

    New Auto-Interp
    Negative Logits
     –↵↵
    -0.19
    ëĶĶìĭľ
    -0.16
    isContained
    -0.15
     ëĦ¤ìĿ´íĬ¸
    -0.15
    -0.15
    :↵↵
    -0.14
     ÙĪØ°ÙĦÙĥ
    -0.14
    ëį°ìĿ´íĬ¸
    -0.14
    -0.14
    :↵↵↵
    -0.14
    POSITIVE LOGITS
     ppl
    0.18
     [=
    0.17
     ...
    0.16
     ...,
    0.15
    /thread
    0.15
     ''
    0.15
     ...)
    0.15
     _
    0.15
     Reply
    0.15
    thread
    0.15
    Act Density 3.981%

    No Known Activations