INDEX
Explanations
pronouns and possessive words
references to individuals and their roles in various scenarios
New Auto-Interp
Negative Logits
venge
-0.80
righteous
-0.69
GOODMAN
-0.68
triumphant
-0.65
STON
-0.64
liber
-0.64
irming
-0.63
ovy
-0.63
congratulations
-0.63
knit
-0.63
POSITIVE LOGITS
lacked
1.84
lacks
1.62
failed
1.54
cannot
1.51
forgot
1.46
underestimated
1.42
refused
1.39
incorrectly
1.38
failed
1.38
couldn
1.37
Activations Density 0.730%