INDEX
Explanations
possessive pronouns and associated locations
New Auto-Interp
Negative Logits
diarrhea
0.47
choice
0.43
affected
0.40
preference
0.39
saliva
0.38
escolha
0.38
teeth
0.37
desires
0.37
discrimination
0.37
disapproval
0.37
POSITIVE LOGITS
身边
0.61
beck
0.55
doorstep
0.47
side
0.46
corner
0.46
hip
0.46
horizonte
0.42
Side
0.41
door
0.41
horizon
0.41
Activations Density 0.010%