INDEX
Explanations
authors and studies mentioned in research papers
phrases that indicate authorship or references to reports and studies
New Auto-Interp
Negative Logits
pring
-0.70
llan
-0.65
twitch
-0.63
Enlarge
-0.62
dude
-0.62
hunt
-0.61
vibration
-0.61
broom
-0.61
RL
-0.60
_-
-0.59
POSITIVE LOGITS
books
0.78
articles
0.77
Letters
0.74
letters
0.74
bestselling
0.72
Ô
0.72
novels
0.71
memoir
0.71
unpublished
0.70
poems
0.69
Activations Density 0.078%