INDEX
    Explanations

    mentions of family members, specifically mothers and fathers

    New Auto-Interp
    Negative Logits
    ;"></
    -0.81
    ]';
    -0.81
     Rais
    -0.80
    gewiesen
    -0.79
    }.
    
    -0.77
    '));
    
    -0.76
     },
    
    -0.73
    endence
    -0.73
    ']>;
    -0.71
     beit
    -0.70
    POSITIVE LOGITS
     dads
    1.18
     dad
    1.12
     moms
    1.11
     guys
    1.10
     wanna
    1.07
     mom
    1.06
     boobs
    1.05
     gonna
    1.04
     GONNA
    1.02
     Dad
    1.01
    Act Density 0.062%

    No Known Activations