INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     neighb
    -0.74
     creditor
    -0.65
     lamp
    -0.65
     centres
    -0.64
     centers
    -0.64
     unpop
    -0.64
     annex
    -0.63
     spinning
    -0.63
     newsp
    -0.62
     lou
    -0.61
    POSITIVE LOGITS
    umat
    0.88
    ensen
    0.81
    oen
    0.79
    ciples
    0.76
    enic
    0.76
    ahon
    0.75
    ute
    0.75
    heed
    0.75
    ihad
    0.75
    erest
    0.73
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.