INDEX
    Explanations

    proper nouns, particularly names and titles in a scientific context

    New Auto-Interp
    Negative Logits
     ویکی‌پدی
    -0.91
    featureID
    -0.83
     autorytatywna
    -0.76
     yym
    -0.72
    webElementXpaths
    -0.70
    Spoljašnje
    -0.69
    aarrggbb
    -0.69
     queſta
    -0.69
     Infórmanos
    -0.68
     Wikimedijinoj
    -0.68
    POSITIVE LOGITS
    );
    0.38
    ).
    0.35
    Thank
    0.33
     Group
    0.32
    ↵↵
    0.32
    0.32
    ];
    0.31
     Davidson
    0.31
     San
    0.31
    <eos>
    0.31
    Act Density 0.002%

    No Known Activations