INDEX
Explanations
places or scenarios related to specific contexts or settings
references to societal norms and issues regarding marriage and relationships
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.10
3:0.10
4:0.04
5:0.13
6:0.05
7:0.09
8:0.08
9:0.04
10:0.15
11:0.08
Negative Logits
��
-0.96
obin
-0.95
avez
-0.91
��
-0.91
Malley
-0.84
berman
-0.81
ALS
-0.78
LLP
-0.77
imester
-0.77
bilt
-0.76
POSITIVE LOGITS
existed
0.85
exists
0.84
goes
0.83
enters
0.83
mattered
0.83
belongs
0.82
survives
0.79
attaches
0.77
happens
0.77
came
0.76
Activations Density 0.097%