INDEX
    Explanations

    words or phrases associated with truth and reality

    New Auto-Interp
    Negative Logits
    lean
    -0.16
     ReturnType
    -0.15
    ãĥ¼ãĤ¿ãĥ¼
    -0.15
     Už
    -0.15
    ãĥ¼ãĤ¿
    -0.15
    ãĥĪãĥª
    -0.15
     tent
    -0.14
    tridge
    -0.14
    .Logic
    -0.14
    rous
    -0.14
    POSITIVE LOGITS
    à¥ĭद
    0.17
    fully
    0.16
    _escape
    0.15
    .Tween
    0.15
     power
    0.15
     sat
    0.14
     Hack
    0.14
    ilde
    0.14
    assen
    0.14
    ijo
    0.14
    Act Density 0.139%

    No Known Activations