INDEX
Explanations
instances of the word "to."
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.08
3:0.08
4:0.09
5:0.08
6:0.08
7:0.08
8:0.08
9:0.08
10:0.09
11:0.08
Negative Logits
Ezek
-1.81
anwhile
-1.73
nown
-1.73
contacts
-1.72
aneously
-1.69
culosis
-1.69
Purch
-1.65
purchases
-1.62
entimes
-1.62
Relief
-1.60
POSITIVE LOGITS
ctor
1.94
allo
1.80
procedural
1.70
Honest
1.68
Rule
1.62
olding
1.60
older
1.59
prejudice
1.56
errors
1.56
semantics
1.54
Activations Density 0.000%