INDEX
Explanations
phrases indicating self-awareness and reflection
Preceding end-of-turn tokens
multilingual words
New Auto-Interp
Negative Logits
WriteLiteral
-0.45
featureID
-0.45
stdc
-0.43
newOwner
-0.43
WriteAttribute
-0.40
numerusform
-0.40
CreateTagHelper
-0.40
distanciation
-0.39
dist
-0.39
Diwedd
-0.39
POSITIVE LOGITS
Diweddarwch
0.54
قایناقلار
0.45
Glej
0.45
käyttö
0.42
CppCodeGen
0.41
désolés
0.41
˾
0.40
tantum
0.39
Vezi
0.39
úgó
0.39
Activations Density 0.417%