INDEX
Explanations
instructions or prompts to write
New Auto-Interp
Negative Logits
Ĭ±
-0.89
agara
-0.84
Unsure
-0.67
Ukrain
-0.66
allows
-0.66
rises
-0.66
nels
-0.65
negie
-0.64
arov
-0.64
EGA
-0.63
POSITIVE LOGITS
writing
0.89
itatively
0.86
smanship
0.85
letters
0.84
journal
0.84
lishing
0.83
penned
0.80
memos
0.80
writer
0.78
essays
0.77
Activations Density 1.858%