INDEX
Explanations
phrases with structured language, possibly indicating formal or persuasive writing
instances of prose and references to bread-related topics
New Auto-Interp
Negative Logits
arov
-0.85
DERR
-0.82
ATES
-0.72
ALE
-0.71
ARDS
-0.68
guiActiveUnfocused
-0.65
ered
-0.64
IENT
-0.64
IELD
-0.64
hap
-0.63
POSITIVE LOGITS
ctors
0.96
prose
0.91
ffer
0.90
guiActiveUn
0.89
itably
0.89
ilk
0.81
ly
0.79
zai
0.76
writer
0.76
itable
0.75
Activations Density 0.009%