INDEX
Explanations
mention of family members
conjunctions and repeated phrases emphasizing connection and continuity
New Auto-Interp
Negative Logits
oward
-0.93
prise
-0.76
itatively
-0.75
ggles
-0.75
utt
-0.73
ruce
-0.72
ucc
-0.72
ulent
-0.70
uts
-0.70
resh
-0.69
POSITIVE LOGITS
therefore
1.36
hence
1.17
thus
1.17
consequently
1.07
cannot
0.97
nobody
0.90
incapable
0.90
prone
0.88
secondly
0.86
enjoys
0.85
Activations Density 0.289%