INDEX
Explanations
references to family dynamics and social relationships
New Auto-Interp
Negative Logits
Narrow
-0.15
acher
-0.14
(<
-0.14
less
-0.14
Tiny
-0.14
apid
-0.13
narrowed
-0.13
weniger
-0.13
fewer
-0.13
]<=
-0.13
POSITIVE LOGITS
large
0.87
larger
0.83
large
0.77
Larger
0.73
LARGE
0.70
Large
0.70
bigger
0.69
Large
0.69
-large
0.66
big
0.66
Activations Density 0.556%