INDEX
    Explanations

    mathematical symbols and related text

    phrases related to negative consequences or effects

    New Auto-Interp
    Negative Logits
     guiActiveUnfocused
    -0.73
     Tid
    -0.70
     scatter
    -0.66
     Dirt
    -0.63
     Libyan
    -0.61
     FAR
    -0.60
     Golem
    -0.59
     FAT
    -0.59
     Belg
    -0.57
     subur
    -0.57
    POSITIVE LOGITS
    should
    0.87
    ¹
    0.85
    -|
    0.84
    ¢
    0.83
    could
    0.83
    §
    0.79
    £
    0.79
    catentry
    0.78
    âĢķ
    0.78
    there
    0.78
    Act Density 0.575%

    No Known Activations