INDEX
Explanations
instances of writing or authorship
New Auto-Interp
Negative Logits
cke
-0.15
gger
-0.15
Barrett
-0.14
xee
-0.14
-bound
-0.14
ward
-0.14
arro
-0.14
ajo
-0.14
trav
-0.14
SCALE
-0.14
POSITIVE LOGITS
Bob
0.15
wing
0.15
patt
0.15
uren
0.14
inh
0.14
aza
0.14
shall
0.14
IFS
0.14
ins
0.14
asic
0.14
Activations Density 0.019%