INDEX
Explanations
references to products and brands related to personal care and household items
after personal pronouns
brand names and titles
New Auto-Interp
Negative Logits
ſind
-0.79
auffi
-0.72
quæ
-0.70
avoient
-0.70
[],
-0.66
"';
-0.66
―――――
-0.65
[{
-0.64
—,
-0.64
faſt
-0.64
POSITIVE LOGITS
I
0.92
stuff
0.81
thing
0.77
though
0.70
if
0.69
didnt
0.68
but
0.67
とか
0.67
just
0.67
thats
0.66
Activations Density 0.434%