INDEX
    Explanations

    fear, hesitation, then action

    New Auto-Interp
    Negative Logits
    …,
    0.94
    Additionally
    0.91
    ...
    0.89
    ...),
    0.86
    ...).
    0.84
    ,
    0.83
    。,
    0.82
    Также
    0.80
    ...,
    0.80
    므로
    0.77
    POSITIVE LOGITS
     yeah
    1.12
     oblivious
    1.10
     yes
    1.08
     unable
    1.04
     powerless
    0.95
     refusing
    0.95
     unaware
    0.93
     bathed
    0.93
     afraid
    0.93
     humbled
    0.92
    Act Density 0.366%

    No Known Activations