INDEX
Explanations
words related to actions that indicate possession or management of tasks and responsibilities
New Auto-Interp
Head Attr Weights
0:0.12
1:0.35
2:0.01
3:0.03
4:0.02
5:0.17
6:0.03
7:0.02
8:0.06
9:0.06
10:0.04
11:0.02
Negative Logits
gian
-1.84
istg
-1.84
mson
-1.71
uld
-1.61
ylene
-1.58
alid
-1.54
Gentle
-1.53
Ul
-1.51
chel
-1.50
sqor
-1.50
POSITIVE LOGITS
it
1.80
slate
1.74
itaire
1.64
it
1.63
iture
1.60
Reason
1.59
ity
1.56
ito
1.54
gered
1.52
iencies
1.52
Activations Density 0.010%