INDEX
Explanations
demonstratives or pronouns referring to a specific group of people
references to specific groups of individuals
New Auto-Interp
Negative Logits
ILY
-0.78
ob
-0.74
enegger
-0.72
inson
-0.70
onis
-0.70
ointment
-0.69
Minion
-0.68
Luck
-0.67
Resolution
-0.67
Drag
-0.63
POSITIVE LOGITS
kinds
0.96
pesky
0.85
sorts
0.84
wishing
0.80
surveyed
0.79
favoring
0.75
interested
0.74
attending
0.73
who
0.73
aspects
0.72
Activations Density 0.068%