INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.07
4:0.09
5:0.07
6:0.09
7:0.08
8:0.07
9:0.09
10:0.08
11:0.07
Negative Logits
イト
-3.15
Eff
-2.98
Measures
-2.78
Refuge
-2.77
Shel
-2.69
Contributions
-2.65
nces
-2.60
ategor
-2.60
Dele
-2.59
裏覚醒
-2.59
POSITIVE LOGITS
orne
2.82
panic
2.71
WB
2.67
Pulitzer
2.67
STON
2.63
arte
2.61
warr
2.58
Seoul
2.51
Rockefeller
2.47
wig
2.47
Activations Density 0.000%
No Known Activations
This feature has no known activations.