INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     introduce
    -0.81
     be
    -0.67
     do
    -0.60
     expose
    -0.60
     create
    -0.60
     reach
    -0.59
     bring
    -0.59
     get
    -0.59
     make
    -0.58
     have
    -0.57
    POSITIVE LOGITS
    ArgsConstructor
    0.79
     ſeveral
    0.76
     auroit
    0.72
     ſmall
    0.69
     poffible
    0.69
     greateſt
    0.69
     feroit
    0.66
     définiti
    0.66
     theſe
    0.66
     fevere
    0.65
    Act Density 0.043%

    No Known Activations