INDEX
    Explanations

    phrases that indicate possession or existence

    New Auto-Interp
    Negative Logits
     themſelves
    -0.59
     myſelf
    -0.59
    ſelf
    -0.58
     pleaſure
    -0.57
     Monfieur
    -0.56
     houſe
    -0.53
     Reſ
    -0.52
     Jefus
    -0.51
    RegressionTest
    -0.51
     reaſon
    -0.49
    POSITIVE LOGITS
    stood
    0.69
     lots
    0.65
     fewest
    0.64
     a
    0.64
     an
    0.61
     such
    0.59
     ligiloj
    0.59
     fewer
    0.58
     features
    0.56
     its
    0.56
    Act Density 0.476%

    No Known Activations