INDEX
Explanations
pronouns followed by verbs in past tense
references to people expressing their desires or needs
New Auto-Interp
Negative Logits
Roose
-0.68
aston
-0.65
phans
-0.63
Craw
-0.61
mop
-0.59
elia
-0.58
haps
-0.58
ixels
-0.58
Mont
-0.57
paragraph
-0.57
POSITIVE LOGITS
wrought
0.85
'd
0.79
happ
0.78
've
0.77
happened
0.76
self
0.76
happen
0.75
wanted
0.73
preached
0.73
're
0.73
Activations Density 0.168%