INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    u
    1.26
    v
    0.91
    𝓑
    0.87
    з
    0.83
    N
    0.78
    ות
    0.76
    0.74
    یس
    0.72
    as
    0.72
    ж
    0.71
    POSITIVE LOGITS
     courageous
    1.13
     brave
    0.97
     valiant
    0.96
     warrior
    0.93
     bravely
    0.92
     💪
    0.87
     intrepid
    0.84
     courage
    0.81
     bravery
    0.81
     heroic
    0.80
    Act Density 0.171%

    No Known Activations