INDEX
    Explanations

    phrases indicating past usage and engagement in activities

    New Auto-Interp
    Negative Logits
    featureID
    -0.60
    énario
    -0.55
    reszcie
    -0.52
    Портали
    -0.51
    HasAnnotation
    -0.49
    Predecesor
    -0.46
    üstü
    -0.44
     désolés
    -0.44
     protoimpl
    -0.43
     got
    -0.43
    POSITIVE LOGITS
     autorytatywna
    0.44
     Autorizaciones
    0.39
    OGND
    0.38
    fören
    0.38
     feroit
    0.38
     singoli
    0.35
    SharedCtor
    0.34
     Constitu
    0.34
    <bos>
    0.33
    Hochspringen
    0.33
    Act Density 0.152%

    No Known Activations