INDEX
    Explanations

    references to creative storytelling elements and character descriptions

    New Auto-Interp
    Negative Logits
    -0.96
     —↵
    -0.75
    -0.69
     —↵↵
    -0.65
     âĪĴ
    -0.65
     –↵
    -0.52
     âĢIJ
    -0.49
     âĢķ
    -0.46
     ï¼į
    -0.43
     Â
    -0.42
    POSITIVE LOGITS
    --
    0.89
    --↵
    0.69
    )--
    0.67
    "--
    0.65
    --[
    0.62
    !--
    0.59
    --,
    0.55
    .--
    0.53
    --↵↵
    0.51
    --)
    0.47
    Act Density 0.104%

    No Known Activations