INDEX
    Explanations

    occurrences of the word "have" in various forms

    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.64
     disambiguazione
    -0.59
    IntoConstraints
    -0.58
    -0.57
     themſelves
    -0.56
     ujednoznacz
    -0.55
     utveckling
    -0.55
     kasarigan
    -0.51
    лтемелер
    -0.50
    homonymie
    -0.49
    POSITIVE LOGITS
     I
    0.74
    I
    0.69
     my
    0.63
     myself
    0.62
     am
    0.58
    myself
    0.54
     guess
    0.53
     Myself
    0.51
     أنا
    0.51
    我没有
    0.50
    Act Density 0.055%

    No Known Activations