INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     ACTIONS
    -0.78
    venge
    -0.73
    idepress
    -0.66
    lishes
    -0.64
     4090
    -0.64
     consensual
    -0.63
     ãĤ
    -0.61
     derivative
    -0.60
    spec
    -0.59
     ç¥ŀ
    -0.58
    POSITIVE LOGITS
    burgh
    0.75
     Bulgar
    0.71
    ancers
    0.65
    halla
    0.64
     Antiqu
    0.64
    EH
    0.64
     Tribe
    0.64
     Undead
    0.63
     Balk
    0.63
    Shape
    0.63
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.