INDEX
    Explanations

    references to the name "Joe" or variations of it

    New Auto-Interp
    Negative Logits
    ̈́
    -0.71
     rind
    -0.66
    ')")
    -0.65
    )");
    
    -0.63
     Michael
    -0.63
    FontAwesome
    -0.62
    Michael
    -0.61
    "],
    
    -0.60
    '))
    
    -0.60
    LEncoder
    -0.60
    POSITIVE LOGITS
     Joe
    2.55
    Joe
    2.31
     joe
    2.11
     JOE
    2.04
    JOE
    1.77
     Joseph
    1.76
    joe
    1.75
    Joseph
    1.61
     joseph
    1.49
     JOSEPH
    1.39
    Act Density 0.039%

    No Known Activations