INDEX
    Explanations

    instances of phrases or expressions that refer to sets of items or categories

    New Auto-Interp
    Negative Logits
    535
    -0.18
    pedia
    -0.17
    etik
    -0.15
    ONGL
    -0.15
    .metamodel
    -0.14
    ologia
    -0.14
    ongs
    -0.14
    ologically
    -0.14
    ior
    -0.14
    oui
    -0.14
    POSITIVE LOGITS
    ï¸ı
    0.16
    efon
    0.15
    illin
    0.14
    amilia
    0.14
    etine
    0.14
    (s
    0.14
    erb
    0.14
    поÑĢ
    0.13
    á»ĵi
    0.13
    ames
    0.13
    Act Density 0.010%

    No Known Activations