INDEX
    Explanations

    phrases expressing strong enthusiasm or commitment towards a subject or activity

    New Auto-Interp
    Negative Logits
    erness
    -0.17
    ázev
    -0.16
    ture
    -0.15
    warf
    -0.15
    udeau
    -0.15
    anson
    -0.15
    ependency
    -0.15
    tual
    -0.14
    ken
    -0.14
    wij
    -0.14
    POSITIVE LOGITS
     Ramp
    0.16
     about
    0.16
    rh
    0.15
     About
    0.14
     amp
    0.14
    ouched
    0.14
     rouge
    0.14
     behalf
    0.13
    atic
    0.13
    ened
    0.13
    Act Density 0.030%

    No Known Activations