INDEX
    Explanations

    expressions of affection and appreciation

    New Auto-Interp
    Negative Logits
    zelf
    -0.14
    irc
    -0.13
    erra
    -0.13
    another
    -0.13
    appen
    -0.13
    href
    -0.13
    anton
    -0.13
    -même
    -0.13
    soon
    -0.13
    roi
    -0.13
    POSITIVE LOGITS
     how
    0.29
     hearing
    0.28
     seeing
    0.23
     everything
    0.23
     nothing
    0.22
    eeee
    0.22
     anything
    0.20
    ee
    0.20
    eee
    0.20
    -lo
    0.19
    Act Density 0.089%

    No Known Activations