INDEX
    Explanations

    phrases related to providing explanations or reasons

    New Auto-Interp
    Negative Logits
    atri
    -0.74
    heit
    -0.71
    arious
    -0.71
    sha
    -0.70
    arel
    -0.69
    olit
    -0.69
    iaries
    -0.68
    Pixel
    -0.68
    ography
    -0.67
    isse
    -0.67
    POSITIVE LOGITS
     discrepancies
    1.04
     deaths
    1.03
     instability
    1.02
     outbreaks
    0.95
     widespread
    0.92
     inconsistencies
    0.92
     variance
    0.92
     disappearance
    0.91
     unexplained
    0.91
     delays
    0.90
    Act Density 0.354%

    No Known Activations