INDEX
    Explanations

    titles or headings

    specific high-frequency characters or symbols, particularly the character 'Ŀ'

    New Auto-Interp
    Negative Logits
     disadvant
    -0.85
     psychiat
    -0.70
     condem
    -0.70
     contrace
    -0.69
     ponder
    -0.69
     behavi
    -0.69
     likeness
    -0.68
     unemploy
    -0.68
     obser
    -0.67
     floppy
    -0.67
    POSITIVE LOGITS
    ï¸ı
    0.96
    °
    0.92
    ¯
    0.86
    º
    0.86
    ï¸
    0.85
    é¾į
    0.81
    âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
    0.80
    âĪ
    0.79
    âĻ¥
    0.79
    log
    0.79
    Act Density 0.164%

    No Known Activations