INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.04
    1:0.05
    2:0.13
    3:0.10
    4:0.12
    5:0.05
    6:0.08
    7:0.10
    8:0.06
    9:0.06
    10:0.08
    11:0.07
    Negative Logits
     novel
    -1.81
     imagined
    -1.80
     thought
    -1.68
     gamb
    -1.68
     ourselves
    -1.67
     fiction
    -1.66
     fan
    -1.61
     innov
    -1.60
     adapt
    -1.60
     unravel
    -1.59
    POSITIVE LOGITS
    Alert
    1.88
    saf
    1.87
    retty
    1.77
     Sv
    1.71
     Skydragon
    1.71
    龍喚士
    1.71
     Attend
    1.69
    ategory
    1.68
    lvl
    1.67
     Regist
    1.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.