INDEX
    Explanations

    phrases or expressions indicating difficulty or challenges

    New Auto-Interp
    Negative Logits
    yar
    -0.15
    ÏģÏħ
    -0.15
    ampus
    -0.15
    osen
    -0.14
    unger
    -0.14
    alı
    -0.14
    nid
    -0.13
    iw
    -0.13
    ainers
    -0.13
    erk
    -0.13
    POSITIVE LOGITS
     lay
    0.47
     average
    0.41
     non
    0.39
     nov
    0.38
     casual
    0.37
     Average
    0.36
     Lay
    0.35
    average
    0.34
    Average
    0.34
     Non
    0.33
    Act Density 0.074%

    No Known Activations