INDEX
Explanations
phrases related to communication and information sharing
phrases related to communication or requests for input
New Auto-Interp
Negative Logits
Distance
-0.55
animate
-0.54
draining
-0.53
predators
-0.51
enery
-0.51
raping
-0.50
Vegeta
-0.49
murdering
-0.49
hurting
-0.49
Females
-0.49
POSITIVE LOGITS
archive
0.78
published
0.74
publish
0.73
redacted
0.73
reader
0.73
informative
0.73
editor
0.71
publication
0.70
edited
0.67
excerpts
0.67
Activations Density 1.988%