INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     srfAttach
    -0.76
     Accuracy
    -0.69
     Labrador
    -0.64
     extent
    -0.64
    VALUE
    -0.63
    urious
    -0.60
     jerk
    -0.60
     degree
    -0.59
    aternity
    -0.58
     Yard
    -0.57
    POSITIVE LOGITS
    wen
    0.84
    ko
    0.82
    jen
    0.82
    kie
    0.82
    swer
    0.82
    nes
    0.82
    thia
    0.81
    stant
    0.81
    ses
    0.81
    lied
    0.81
    Act Density 0.004%

    No Known Activations