INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.08
2:0.07
3:0.09
4:0.09
5:0.08
6:0.06
7:0.09
8:0.09
9:0.08
10:0.08
11:0.08
Negative Logits
endas
-3.40
inho
-3.03
aucuses
-2.92
veto
-2.90
ishops
-2.87
eto
-2.85
mun
-2.80
Pope
-2.75
ushima
-2.74
adra
-2.73
POSITIVE LOGITS
ABE
3.68
RL
2.88
MM
2.85
Stevenson
2.80
LI
2.79
QC
2.76
JR
2.66
TN
2.62
ALE
2.62
WD
2.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.