INDEX
Explanations
life events and personal details described in narrative text
New Auto-Interp
Negative Logits
enko
-0.31
eni
-0.26
imar
-0.26
Hort
-0.25
ped
-0.25
uner
-0.24
Dek
-0.23
elist
-0.23
anda
-0.23
utra
-0.23
POSITIVE LOGITS
ILCS
0.24
misunder
0.24
holes
0.23
masc
0.23
ops
0.22
inequalities
0.22
services
0.22
lineback
0.22
loopholes
0.22
ridges
0.22
Activations Density 0.022%