INDEX
    Explanations

    references to personal thoughts, feelings, and experiences

    New Auto-Interp
    Negative Logits
    ilog
    -0.16
     reck
    -0.15
     wondered
    -0.15
     suy
    -0.14
    duk
    -0.14
    VICE
    -0.14
     wonder
    -0.14
    opus
    -0.14
    _try
    -0.14
    ضا
    -0.14
    POSITIVE LOGITS
     mentioned
    0.24
     Mention
    0.22
     mention
    0.21
    mentioned
    0.21
     mentioning
    0.21
    mention
    0.20
     disclaimer
    0.18
     mentions
    0.18
     touched
    0.17
    claimer
    0.16
    Act Density 0.148%

    No Known Activations