INDEX
    Explanations

    categories of content related to location or classification

    categories or types of entities

    New Auto-Interp
    Negative Logits
    elf
    -0.42
    y
    -0.39
     Opfer
    -0.37
    alu
    -0.37
    hy
    -0.36
    sy
    -0.36
     aloud
    -0.35
    long
    -0.35
     teasing
    -0.35
     harassment
    -0.35
    POSITIVE LOGITS
     tartalomajánló
    0.96
    Portail
    0.92
    بوابة
    0.82
    Datuak
    0.79
    Portale
    0.73
     الحره
    0.73
    DockStyle
    0.72
     Италијани
    0.72
     bezeichneter
    0.69
    extAlignment
    0.67
    Act Density 0.009%

    No Known Activations