INDEX
    Explanations

    phrases expressing uncertainty or questioning

    New Auto-Interp
    Negative Logits
    Shin
    -0.48
    Cecil
    -0.47
    bsen
    -0.47
    (-\
    -0.45
    ViewModels
    -0.45
    (-
    -0.44
     Wör
    -0.44
     Lij
    -0.44
    morris
    -0.44
    Dich
    -0.44
    POSITIVE LOGITS
     unknow
    0.90
     disambiguazione
    0.82
    TintMode
    0.81
     يتيمه
    0.78
     dunno
    0.76
     Unknown
    0.74
    我不知道
    0.74
     unknown
    0.73
     desconoc
    0.73
     дописавши
    0.72
    Act Density 0.205%

    No Known Activations