INDEX
    Explanations

    references to specific names and characters

    New Auto-Interp
    Negative Logits
     Darius
    -0.57
    Darren
    -0.54
     Darren
    -0.54
     Dutchman
    -0.53
     Wilfred
    -0.53
    Rodney
    -0.52
    ‍♂️
    -0.52
     Dwayne
    -0.52
     Eric
    -0.51
    carlos
    -0.51
    POSITIVE LOGITS
    <?
    0.52
     Ann
    0.49
     rostros
    0.48
    Rose
    0.46
    Ann
    0.45
    unzel
    0.43
    Grace
    0.43
     Rose
    0.43
     oídos
    0.43
     Grace
    0.42
    Act Density 0.763%

    No Known Activations