INDEX
Explanations
question marks and indications of inquiries or uncertainty
New Auto-Interp
Negative Logits
aison
-0.17
979
-0.15
.sf
-0.15
ĵį
-0.14
za
-0.14
atar
-0.14
dag
-0.13
ree
-0.13
ologi
-0.13
iously
-0.13
POSITIVE LOGITS
none
0.33
none
0.31
None
0.28
None
0.27
correct
0.26
_none
0.26
answer
0.25
.none
0.24
NONE
0.23
NONE
0.23
Activations Density 0.007%