INDEX
Explanations
references to characters and titles of nobility or authority
New Auto-Interp
Negative Logits
متعلقه
-1.24
itſelf
-1.20
الرياضيه
-1.17
himſelf
-1.17
themſelves
-1.14
ſelves
-1.14
صوتيه
-1.14
myſelf
-1.08
Theſe
-1.08
ValueStyle
-1.07
POSITIVE LOGITS
,
0.59
0.50
did
0.47
k
0.45
<eos>
0.44
_
0.43
skar
0.43
↵
0.43
it
0.43
</b>
0.43
Activations Density 0.100%