INDEX
    Explanations

    quotation marks indicating dialogue or quotes

    New Auto-Interp
    Negative Logits
    -0.39
     âĢŀ
    -0.38
     (“
    -0.30
     “[
    -0.29
    ãĢĮãģĤ
    -0.26
    ãĢĮ
    -0.25
     ãĢĮ
    -0.23
    ãĢĮãģĬ
    -0.23
     ``
    -0.23
    “We
    -0.22
    POSITIVE LOGITS
    []"
    0.26
    ()"
    0.24
    ()",
    0.21
    ¦
    0.20
    ."↵↵
    0.20
    ()"↵
    0.19
    !",
    0.18
    !"
    0.18
    ();"
    0.18
    ?",
    0.18
    Act Density 0.488%

    No Known Activations