INDEX
    Explanations

    occurrences of high numerical values associated with legal terminology or actions

    New Auto-Interp
    Negative Logits
    Tikang
    -0.93
    ed
    -0.87
    ʺ
    -0.78
    tron
    -0.72
     ank
    -0.72
     Kw
    -0.70
     hoàng
    -0.69
    bank
    -0.68
     Oy
    -0.68
     cron
    -0.65
    POSITIVE LOGITS
    ↵↵↵
    1.85
    ↵↵↵↵
    1.48
    ↵↵↵↵↵↵
    1.38
    ↵↵↵↵↵
    1.35
    ↵↵↵↵↵↵↵
    1.27
    ↵↵↵↵↵↵↵↵
    1.20
    にほんブログ村
    1.17
    ↵↵↵↵↵↵↵↵↵↵↵↵
    1.15
    ↵↵↵↵↵↵↵↵↵↵↵
    1.15
    ↵↵↵↵↵↵↵↵↵
    1.10
    Act Density 0.095%

    No Known Activations