INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.08
3:0.08
4:0.09
5:0.07
6:0.08
7:0.07
8:0.08
9:0.08
10:0.08
11:0.09
Negative Logits
Reviewer
-1.97
profitability
-1.94
loyal
-1.83
legitimacy
-1.75
coffers
-1.71
blooded
-1.70
virtues
-1.67
superiority
-1.62
unsus
-1.60
arak
-1.60
POSITIVE LOGITS
��
1.92
raints
1.85
���
1.84
��
1.80
�
1.63
Zone
1.61
prefix
1.59
Feedback
1.59
Brief
1.57
█
1.53
Activations Density 0.000%
No Known Activations
This feature has no known activations.