INDEX
    Explanations

    repeated phrases or words, particularly those that emphasize importance or precedence

    first few words of phrases

    New Auto-Interp
    Negative Logits
    featureID
    -0.78
    ſelf
    -0.58
     pleaſure
    -0.57
    EDEFAULT
    -0.56
    ſelves
    -0.55
    
    -0.54
    RTSN
    -0.54
    HasAnnotation
    -0.52
    webElementXpaths
    -0.51
     ſta
    -0.51
    POSITIVE LOGITS
    UserScript
    0.49
     Bowles
    0.44
     babak
    0.42
    DockStyle
    0.38
     either
    0.35
     early
    0.35
     belangrijke
    0.35
     terceira
    0.35
    dropIfExists
    0.34
    0.34
    Act Density 0.026%

    No Known Activations