INDEX
    Explanations

    specific nouns related to a group or category

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.07
    2:0.07
    3:0.08
    4:0.08
    5:0.08
    6:0.07
    7:0.08
    8:0.09
    9:0.11
    10:0.07
    11:0.08
    Negative Logits
    Guest
    -2.53
    lycer
    -2.44
    Alice
    -2.33
    mble
    -2.22
    Scene
    -2.20
    Alias
    -2.19
    ricia
    -2.17
    Jess
    -2.16
    ERY
    -2.16
    Neigh
    -2.14
    POSITIVE LOGITS
     feared
    2.04
    の�
    2.00
     Clash
    1.99
     dominating
    1.95
    Football
    1.94
     NFL
    1.92
     Appeal
    1.91
    1.91
     fatig
    1.90
     challenges
    1.90
    Act Density 0.000%

    No Known Activations