INDEX
    Explanations

    phrases indicating that a citation is required

    phrases indicating citations or references that require attribution

    New Auto-Interp
    Negative Logits
    morrow
    -0.63
    ipation
    -0.62
     destructive
    -0.61
    Tokens
    -0.61
    cipled
    -0.59
    Shop
    -0.59
    ochond
    -0.58
    oult
    -0.57
    animate
    -0.57
    akuya
    -0.56
    POSITIVE LOGITS
    ]
    0.84
    }.
    0.81
    *)
    0.79
    ]:
    0.76
     redacted
    0.76
     ]
    0.75
    )]
    0.74
    ]"
    0.73
    >)
    0.73
     omitted
    0.73
    Act Density 0.054%

    No Known Activations