INDEX
    Explanations

    positive adjectives

    expressions of positive evaluations or praise

    New Auto-Interp
    Negative Logits
    eters
    -0.81
    ople
    -0.80
    iper
    -0.73
    eter
    -0.73
    istan
    -0.72
    pper
    -0.71
    hyde
    -0.70
    hip
    -0.70
    hod
    -0.70
     Pavilion
    -0.69
    POSITIVE LOGITS
    enough
    1.35
     enough
    1.09
    reads
    1.02
     luck
    1.00
    sword
    0.96
     Enough
    0.92
     intentions
    0.91
     Samar
    0.91
     ol
    0.88
    luck
    0.86
    Act Density 0.064%

    No Known Activations