INDEX
Explanations
pronouns related to ownership and possession
New Auto-Interp
Negative Logits
quate
-0.06
arness
-0.06
othy
-0.06
owied
-0.06
Baron
-0.06
asic
-0.06
reib
-0.06
oft
-0.06
orra
-0.06
glove
-0.06
POSITIVE LOGITS
behalf
0.13
occasion
0.07
lv
0.07
ollar
0.07
lap
0.06
pector
0.06
screen
0.06
lap
0.06
227
0.06
radar
0.06
Activations Density 0.031%