INDEX
Explanations
proper nouns like names of people, places, and organizations
New Auto-Interp
Negative Logits
");
-0.70
tsy
-0.62
").
-0.61
scl
-0.58
"/>
-0.57
vale
-0.56
');
-0.56
ocalypse
-0.56
nih
-0.55
atile
-0.55
POSITIVE LOGITS
meanwhile
1.43
however
1.28
moreover
1.15
alas
0.98
unsurprisingly
0.93
huh
0.91
incidentally
0.89
albeit
0.86
along
0.86
therefore
0.86
Activations Density 1.764%