INDEX
Explanations
symbols and punctuation marks
New Auto-Interp
Negative Logits
ers
-0.73
te
-0.64
الشرق
-0.61
osh
-0.60
er
-0.60
Dol
-0.60
Thy
-0.59
(
-0.59
dol
-0.58
Goy
-0.57
POSITIVE LOGITS
}))
1.29
]")]
1.24
}))
1.17
])
1.16
referenties
1.12
})]
1.11
'])
1.10
})
1.09
"]))
1.09
])]
1.08
Activations Density 0.860%