INDEX
Explanations
phrases related to financial responsibility or accountability
New Auto-Interp
Head Attr Weights
0:0.06
1:0.07
2:0.22
3:0.05
4:0.05
5:0.04
6:0.05
7:0.07
8:0.05
9:0.05
10:0.18
11:0.06
Negative Logits
maxwell
-3.04
atis
-2.98
ilateral
-2.92
alys
-2.87
ould
-2.79
vier
-2.75
uti
-2.70
reat
-2.65
£
-2.63
oeuv
-2.63
POSITIVE LOGITS
Me
2.94
Meyer
2.52
Neb
2.41
EW
2.36
Sans
2.35
Noir
2.32
Megan
2.25
Gel
2.24
Ober
2.23
Jenner
2.19
Activations Density 0.000%