INDEX
    Explanations

    mentions of specific names, titles, or entities

    references to specific organizations, food items, and individuals

    New Auto-Interp
    Negative Logits
    ãĥ¥
    -0.79
    kees
    -0.75
    ocular
    -0.68
    £
    -0.66
    iltration
    -0.64
     allowance
    -0.61
    WAYS
    -0.60
     NHS
    -0.60
    ffen
    -0.60
    eral
    -0.59
    POSITIVE LOGITS
    aic
    0.89
    y
    0.88
    ments
    0.85
    spe
    0.79
    Ģ
    0.77
    sonian
    0.76
    ¯
    0.76
    teenth
    0.75
    pillar
    0.75
    ors
    0.74
    Act Density 0.029%

    No Known Activations