INDEX
    Explanations

    references to family dynamics and social relationships

    New Auto-Interp
    Negative Logits
     Narrow
    -0.15
    acher
    -0.14
     (<
    -0.14
     less
    -0.14
     Tiny
    -0.14
    apid
    -0.13
     narrowed
    -0.13
     weniger
    -0.13
     fewer
    -0.13
    ]<=
    -0.13
    POSITIVE LOGITS
     large
    0.87
     larger
    0.83
    large
    0.77
     Larger
    0.73
     LARGE
    0.70
     Large
    0.70
     bigger
    0.69
    Large
    0.69
    -large
    0.66
     big
    0.66
    Act Density 0.556%

    No Known Activations