INDEX
Explanations
specific punctuation or formatting elements
New Auto-Interp
Negative Logits
Goy
-0.71
émon
-0.66
Dol
-0.66
tish
-0.64
climate
-0.63
Hau
-0.63
FORME
-0.63
phi
-0.62
umph
-0.61
Ə
-0.61
POSITIVE LOGITS
])
1.52
}))
1.47
]")]
1.46
})
1.44
€)
1.37
())
1.37
'])
1.36
>)
1.35
))
1.35
__)
1.33
Activations Density 0.321%