INDEX
    Explanations

    references to categories or types of items or concepts

    New Auto-Interp
    Negative Logits
    featureID
    -0.63
    contentLoaded
    -0.61
    OGND
    -0.59
    transQ
    -0.59
    SourceChecksum
    -0.57
    waitKey
    -0.55
     bağlantılar
    -0.54
     terciopelo
    -0.53
     propOrder
    -0.53
    angliski
    -0.53
    POSITIVE LOGITS
    ztály
    0.42
    entang
    0.40
     of
    0.38
    łgorzata
    0.36
     those
    0.36
     Huck
    0.35
     Ashley
    0.35
     Hayley
    0.35
    ombic
    0.35
     Katie
    0.33
    Act Density 0.012%

    No Known Activations