INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.08
3:0.08
4:0.07
5:0.08
6:0.07
7:0.08
8:0.07
9:0.07
10:0.08
11:0.09
Negative Logits
��
-2.42
��
-2.06
�
-1.95
opian
-1.94
galitarian
-1.87
oples
-1.85
ONSORED
-1.82
gow
-1.82
nesota
-1.81
��
-1.78
POSITIVE LOGITS
declass
2.20
credited
1.82
describ
1.81
Blizzard
1.80
attm
1.78
header
1.73
documenting
1.72
Graveyard
1.71
whistleblower
1.64
slip
1.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.