INDEX
Explanations
attends to generalized articles or pronouns from specified career-related terms
New Auto-Interp
Head Attr Weights
0:0.09
1:0.11
2:0.10
3:0.14
4:0.12
5:0.06
6:0.20
7:0.15
Negative Logits
Przypisy
-0.27
HRM
-0.27
eteria
-0.27
ujednoznacz
-0.26
ην
-0.26
Hentet
-0.25
дописавши
-0.25
wapV
-0.25
HMIS
-0.25
Datuak
-0.24
POSITIVE LOGITS
MarshalTo
0.36
puissiez
0.29
__*/
0.29
giustizia
0.29
<<<<<<<<<<<<<<
0.29
상세
0.28
youll
0.28
akti
0.27
ustain
0.27
req
0.27
Activations Density 0.065%