INDEX
    Explanations

    references to dimensions or measurements

    New Auto-Interp
    Negative Logits
    dol
    -0.16
    mars
    -0.14
    /private
    -0.14
    loff
    -0.14
    etat
    -0.14
    foot
    -0.14
    itary
    -0.14
    kus
    -0.13
    pcl
    -0.13
    ather
    -0.13
    POSITIVE LOGITS
    ened
    0.35
    ening
    0.32
    wise
    0.31
    iness
    0.24
    iest
    0.21
    ier
    0.21
    ily
    0.21
    lessness
    0.20
    eners
    0.18
    /color
    0.18
    Act Density 0.061%

    No Known Activations