INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.06
2:0.09
3:0.07
4:0.08
5:0.08
6:0.07
7:0.08
8:0.07
9:0.08
10:0.09
11:0.08
Negative Logits
thood
-2.01
ython
-1.99
isine
-1.80
AME
-1.79
TeX
-1.79
HTML
-1.77
anguages
-1.77
ciating
-1.70
eq
-1.67
EVs
-1.62
POSITIVE LOGITS
fathers
1.61
desp
1.52
sweep
1.50
Kok
1.48
Lau
1.46
divert
1.44
setback
1.41
ris
1.40
petition
1.39
disp
1.38
Activations Density 0.000%
No Known Activations
This feature has no known activations.