INDEX
Explanations
repeated conjunctions and affirming phrases in dialogue
New Auto-Interp
Negative Logits
ſelf
-0.75
ddelweddau
-0.69
줌
-0.61
itſelf
-0.61
myſelf
-0.61
ArrowToggle
-0.60
weakSelf
-0.60
('');
-0.59
msglen
-0.58
ſelves
-0.57
POSITIVE LOGITS
And
0.74
it
0.69
everybody
0.68
you
0.67
And
0.66
we
0.64
they
0.63
if
0.61
everyone
0.60
commit
0.59
Activations Density 0.319%