INDEX
Explanations
personal interactions, conflicts, and emotional reactions
references to relationships and emotional responses
New Auto-Interp
Negative Logits
Enhance
-0.69
imum
-0.63
Header
-0.62
ãĥ¡
-0.61
eatures
-0.61
Depth
-0.60
Ranking
-0.60
ãĥij
-0.59
Rank
-0.59
native
-0.59
POSITIVE LOGITS
refused
1.36
intervened
1.31
objected
1.22
begged
1.21
resorted
1.21
protested
1.20
persisted
1.19
yelled
1.18
insisted
1.17
resisted
1.17
Activations Density 0.340%