INDEX
    Explanations

    references to human relationships and personal narratives

    New Auto-Interp
    Negative Logits
    "]));
    -0.73
    '];
    
    -0.69
    ']);
    
    -0.66
    "]);
    -0.66
    Hentet
    -0.65
    ']));
    -0.64
    "];
    
    -0.63
    ")));
    
    -0.63
     تانيه
    -0.60
     */
    
    
    -0.60
    POSITIVE LOGITS
    featureID
    0.74
     joined
    0.61
     whom
    0.60
     specialize
    0.57
     specializes
    0.55
     helped
    0.54
     assisted
    0.54
     deserve
    0.52
     represents
    0.51
     preceded
    0.51
    Act Density 0.365%

    No Known Activations