INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    xual
    -0.83
    nces
    -0.81
    requent
    -0.78
    lav
    -0.70
    cientious
    -0.69
     PLUS
    -0.69
    iors
    -0.68
     photograp
    -0.67
    afe
    -0.66
    idental
    -0.66
    POSITIVE LOGITS
     proverbial
    0.86
     stone
    0.83
     coffin
    0.81
     rotten
    0.78
     hay
    0.77
     crystal
    0.76
     feathers
    0.76
     roses
    0.74
     iceberg
    0.73
     shovel
    0.72
    Act Density 0.287%

    No Known Activations