INDEX
    Explanations

    the name "Robert" in various contexts and forms

    New Auto-Interp
    Negative Logits
     whoſe
    -1.07
     itſelf
    -0.98
     ſhould
    -0.94
     himſelf
    -0.87
     cauſe
    -0.85
     whofe
    -0.85
     Theſe
    -0.84
    osť
    -0.84
     againſt
    -0.82
     becauſe
    -0.82
    POSITIVE LOGITS
     Robert
    1.38
    Robert
    1.26
     Roberts
    1.17
     ROBERT
    1.16
    ROBERT
    1.11
    robert
    1.09
     robert
    1.09
    Roberts
    1.05
     Rober
    1.00
     Bob
    0.98
    Act Density 0.009%

    No Known Activations