INDEX
    Explanations

    phrases that indicate attribution or sourcing of information

    New Auto-Interp
    Negative Logits
    ukan
    -0.16
       
    -0.15
    raison
    -0.15
    leo
    -0.15
    yb
    -0.14
    imized
    -0.14
    iversity
    -0.14
    urum
    -0.14
    resse
    -0.13
    icers
    -0.13
    POSITIVE LOGITS
    ly
    0.28
    ately
    0.19
    LY
    0.18
     according
    0.17
    ally
    0.17
    ÑģÑĮ
    0.17
    ingly
    0.15
    alf
    0.15
    ances
    0.15
     legend
    0.15
    Act Density 0.035%

    No Known Activations