INDEX
Explanations
punctuation marks that indicate enumeration or lists
New Auto-Interp
Negative Logits
arie
-0.61
ropolitan
-0.61
yon
-0.61
uber
-0.59
Moines
-0.57
gow
-0.57
paren
-0.56
orget
-0.55
vae
-0.54
irs
-0.54
POSITIVE LOGITS
prompting
1.18
hence
1.09
thus
1.04
resulting
1.03
respectively
1.02
although
1.02
albeit
1.01
implying
1.00
namely
0.99
whereas
0.99
Activations Density 0.274%