INDEX
Explanations
pronouns and their usage in various contexts
New Auto-Interp
Negative Logits
oui
-0.16
Ngh
-0.14
ring
-0.14
note
-0.14
Holl
-0.14
imb
-0.13
gener
-0.13
atti
-0.13
åº
-0.13
ilian
-0.13
POSITIVE LOGITS
umbn
0.17
ertino
0.16
eldorf
0.15
prite
0.15
UnderTest
0.15
ưỡng
0.15
uzzi
0.14
มà¸Ļ
0.14
byn
0.14
ircon
0.14
Activations Density 0.437%