INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.07
3:0.08
4:0.08
5:0.08
6:0.08
7:0.08
8:0.07
9:0.10
10:0.08
11:0.07
Negative Logits
index
-1.80
averaging
-1.71
iencies
-1.58
weighs
-1.53
owment
-1.52
Monthly
-1.52
petitioner
-1.51
incur
-1.50
cohol
-1.49
entails
-1.43
POSITIVE LOGITS
theirs
2.15
Reilly
2.04
IRC
1.96
horm
1.95
orks
1.95
Reloaded
1.92
amples
1.81
eus
1.80
ueller
1.76
comings
1.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.