INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     harbor
    -0.87
     colorless
    -0.86
     harbors
    -0.84
    Neighbors
    -0.82
     vapors
    -0.81
     colored
    -0.78
    Defense
    -0.77
     colors
    -0.76
     Colored
    -0.75
     splendor
    -0.75
    POSITIVE LOGITS
     myſelf
    0.77
    '],'
    0.77
     }}$}
    0.75
    //});
    0.70
    })*/
    0.69
    ".
    
    0.69
     raiſ
    0.68
     itſelf
    0.68
    '}>
    0.66
     ...]
    0.66
    Act Density 0.837%

    No Known Activations