INDEX
    Explanations

    names of people, especially those involved in political or public affairs

    names, particularly those of notable individuals

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.03
    2:0.18
    3:0.07
    4:0.16
    5:0.05
    6:0.03
    7:0.04
    8:0.05
    9:0.17
    10:0.06
    11:0.03
    Negative Logits
    acea
    -1.37
     independents
    -1.24
    idays
    -1.22
    ModLoader
    -1.21
    -1.19
    Topics
    -1.11
    anical
    -1.11
    Tang
    -1.10
    Reviewer
    -1.10
    CLE
    -1.09
    POSITIVE LOGITS
    ğ
    1.48
    uty
    1.32
    iste
    1.29
    uve
    1.27
    igham
    1.27
    1.26
    ault
    1.24
    uten
    1.22
    chal
    1.22
     Fey
    1.18
    Act Density 0.004%

    No Known Activations