INDEX
    Explanations

    the term "simple" and related simple or basic concepts

    New Auto-Interp
    Negative Logits
    hawk
    -0.16
    oller
    -0.16
    ipur
    -0.16
    ntax
    -0.15
    urement
    -0.15
    mey
    -0.15
    orden
    -0.14
    ogi
    -0.14
    utto
    -0.14
    znam
    -0.14
    POSITIVE LOGITS
     Jeremy
    0.16
    lak
    0.16
    ichel
    0.15
     reconcile
    0.14
     reconc
    0.14
     icon
    0.14
    uese
    0.14
    оÑģÑĤ
    0.14
     rooms
    0.14
    Interpolator
    0.14
    Act Density 0.023%

    No Known Activations