INDEX
    Explanations

    mentions of specific names or terms with varied linguistic structures

    proper nouns, especially names and brands

    New Auto-Interp
    Negative Logits
     BI
    -0.55
    ¶ħ
    -0.53
     CONT
    -0.52
    semb
    -0.51
    taboola
    -0.51
     CONTR
    -0.50
    tags
    -0.50
     ASP
    -0.49
      
    -0.49
     recru
    -0.48
    POSITIVE LOGITS
    unia
    0.67
    idia
    0.67
    hess
    0.66
    veland
    0.61
    REDACTED
    0.60
    ë
    0.58
    enges
    0.58
    ragon
    0.57
    inces
    0.57
    ledge
    0.56
    Act Density 0.623%

    No Known Activations