INDEX
    Explanations

    references to individuals, particularly focusing on occurrences of the word "person."

    New Auto-Interp
    Negative Logits
    Ĭ
    -2.64
    ĵ
    -2.61
    ·¸
    -2.53
    ·
    -2.43
    ı
    -2.41
                                          
    -2.40
    ↵↵       
    -2.40
                                                                                                                                                                                                                                                                    
    -2.40
                                                          
    -2.40
    -2.40
    POSITIVE LOGITS
    nel
    2.41
     who
    1.88
    ila
    1.86
    uscript
    1.81
    nal
    1.75
    iscus
    1.73
     owns
    1.70
    ager
    1.65
    acles
    1.65
    arman
    1.65
    Act Density 0.185%

    No Known Activations