INDEX
    Explanations

    technical language related to the physical world

    instances of requests or prompts related to actions or decisions made by individuals or groups

    New Auto-Interp
    Negative Logits
    é¾
    -0.81
    CLUD
    -0.76
    OND
    -0.72
    hest
    -0.65
    foundland
    -0.65
    IQ
    -0.64
    worthy
    -0.62
    URI
    -0.58
    KN
    -0.58
    Condition
    -0.57
    POSITIVE LOGITS
    Instead
    1.15
     Instead
    1.08
    instead
    1.07
     instead
    0.99
     Rather
    0.86
    Rather
    0.83
     preferring
    0.78
     anymore
    0.77
     opting
    0.75
     let
    0.75
    Act Density 0.094%

    No Known Activations