INDEX
    Explanations

    instances of affirmation or agreement in dialogue

    New Auto-Interp
    Negative Logits
    aine
    -0.16
    uben
    -0.16
    obili
    -0.15
    766
    -0.15
    \Module
    -0.14
    ixel
    -0.14
    uzzi
    -0.14
    Host
    -0.14
    zas
    -0.14
    uct
    -0.14
    POSITIVE LOGITS
     Lever
    0.15
     Lambert
    0.15
     bridge
    0.15
     arc
    0.15
     Spectrum
    0.14
     Spe
    0.14
    _TUN
    0.14
    implicitly
    0.14
     cliff
    0.14
    ãĤ·ãĥªãĥ¼ãĤº
    0.13
    Act Density 0.023%

    No Known Activations