INDEX
Explanations
phrases indicating that a citation is required
phrases indicating citations or references that require attribution
New Auto-Interp
Negative Logits
morrow
-0.63
ipation
-0.62
destructive
-0.61
Tokens
-0.61
cipled
-0.59
Shop
-0.59
ochond
-0.58
oult
-0.57
animate
-0.57
akuya
-0.56
POSITIVE LOGITS
]
0.84
}.
0.81
*)
0.79
]:
0.76
redacted
0.76
]
0.75
)]
0.74
]"
0.73
>)
0.73
omitted
0.73
Activations Density 0.054%