INDEX
Explanations
parentheses and square brackets used in written text
parentheses and brackets within the text
New Auto-Interp
Negative Logits
cog
-0.87
buoy
-0.83
Cth
-0.79
redu
-0.76
Puzz
-0.73
wrink
-0.73
creep
-0.71
dense
-0.70
embr
-0.70
brim
-0.70
POSITIVE LOGITS
sic
1.85
laughs
1.44
emphasis
1.27
insert
1.25
the
1.24
exec
1.20
ex
1.17
police
1.17
his
1.16
another
1.15
Activations Density 0.060%