INDEX
    Explanations

    numerical thresholds related to difficulty levels

    New Auto-Interp
    Negative Logits
    ovsky
    -0.17
    ese
    -0.16
     Baghd
    -0.15
    ller
    -0.15
    styleType
    -0.15
    ych
    -0.15
     Ames
    -0.15
    alse
    -0.14
    AGON
    -0.14
    ereotype
    -0.14
    POSITIVE LOGITS
    åĿĽ
    0.17
    rosse
    0.15
    obic
    0.15
     Sherman
    0.14
    uter
    0.14
    úi
    0.14
    aign
    0.14
    ostel
    0.14
    HQ
    0.13
    à¥Ĥद
    0.13
    Act Density 0.030%

    No Known Activations