INDEX
Explanations
punctuation and formatting elements in the text
New Auto-Interp
Negative Logits
Extern
-0.17
inha
-0.15
abar
-0.15
STEM
-0.14
obe
-0.14
Ulus
-0.14
ASM
-0.13
rrha
-0.13
ils
-0.13
tera
-0.13
POSITIVE LOGITS
izon
0.16
vier
0.16
eldorf
0.15
vation
0.15
iona
0.15
Oliv
0.14
ilon
0.14
uggy
0.14
aks
0.14
rowsable
0.13
Activations Density 0.001%