INDEX
    Explanations

    themes related to consequences, both positive and negative, in various contexts

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.10
    3:0.17
    4:0.12
    5:0.04
    6:0.22
    7:0.07
    8:0.03
    9:0.04
    10:0.05
    11:0.05
    Negative Logits
    Rated
    -1.65
     Annotations
    -1.62
    fleet
    -1.56
    aido
    -1.44
     Nanto
    -1.41
    uyomi
    -1.35
    psey
    -1.34
    reply
    -1.33
     [+
    -1.31
     Rico
    -1.27
    POSITIVE LOGITS
     bang
    1.41
     ourselves
    1.36
     backward
    1.34
     democrat
    1.28
     perfection
    1.24
     essentials
    1.22
     apples
    1.22
     peripher
    1.21
     convenient
    1.19
     backwards
    1.19
    Act Density 0.026%

    No Known Activations