INDEX
Explanations
phrases that indicate a focus on examples or instructional content
New Auto-Interp
Negative Logits
vier
-0.16
رس
-0.15
quirer
-0.14
пеÑĢеп
-0.14
aki
-0.13
yar
-0.13
dzi
-0.13
iro
-0.13
ieri
-0.13
scri
-0.13
POSITIVE LOGITS
example
0.35
unately
0.27
instance
0.27
exemple
0.26
Example
0.25
cing
0.25
example
0.25
-example
0.21
ÙħثاÙĦ
0.20
details
0.20
Activations Density 0.063%