INDEX
    Explanations

    phrases that introduce examples or explanations

    New Auto-Interp
    Negative Logits
    kou
    -0.16
    esin
    -0.15
     opinion
    -0.14
    pra
    -0.14
     flex
    -0.14
    decess
    -0.14
     Opinion
    -0.14
    logical
    -0.14
     consult
    -0.14
     towers
    -0.13
    POSITIVE LOGITS
    ril
    0.15
     ÙħØ«ÙĦا
    0.15
    aken
    0.14
    매
    0.14
    .xtext
    0.14
     Emit
    0.14
    åºľ
    0.14
    oda
    0.13
    ëħ
    0.13
     Bout
    0.13
    Act Density 0.065%

    No Known Activations