INDEX
    Explanations

    results and consequences

    New Auto-Interp
    Negative Logits
    𝑔
    2.64
    𝑜
    2.62
    ness
    2.48
    synth
    2.35
    𝑙
    2.27
    l
    2.26
    serializer
    2.17
    currentPlayer
    2.17
    𝑑
    2.15
    2.12
    POSITIVE LOGITS
    ه
    2.74
    2.66
    л
    2.66
     lecz
    2.53
    м
    2.53
    ة
    2.44
    एं
    2.36
    ו
    2.22
    боро
    2.21
    ۰
    2.20
    Act Density 0.156%

    No Known Activations