INDEX
    Explanations

    the presence of the phrase "the" in various contexts

    New Auto-Interp
    Negative Logits
     similarly
    -0.86
    similar
    -0.74
     Similar
    -0.74
     similar
    -0.74
    Similarly
    -0.72
     Similarly
    -0.71
    Similar
    -0.70
     Like
    -0.65
     SIMILAR
    -0.63
     equally
    -0.63
    POSITIVE LOGITS
     ſame
    1.19
     myſelf
    1.10
     itſelf
    0.98
     zelve
    0.91
     themſelves
    0.89
     sae
    0.89
     saine
    0.86
     samym
    0.86
     sane
    0.86
     Theſe
    0.86
    Act Density 0.135%

    No Known Activations