INDEX
Explanations
proper nouns or names in sentences
instances of the word "the" at the beginning of phrases
New Auto-Interp
Negative Logits
Deal
-0.68
Availability
-0.67
interstitial
-0.67
Cause
-0.63
Reward
-0.63
terness
-0.61
num
-0.61
Upload
-0.59
UF
-0.58
Tomorrow
-0.57
POSITIVE LOGITS
however
1.40
meanwhile
1.25
huh
1.14
though
1.00
moreover
1.00
albeit
0.91
although
0.87
unsurprisingly
0.84
therefore
0.83
alas
0.83
Activations Density 1.126%