INDEX
Explanations
verbs and phrases related to measurement and evaluation
New Auto-Interp
Negative Logits
/her
-0.16
umer
-0.16
NESS
-0.15
arde
-0.15
ness
-0.15
åĹ
-0.15
ewan
-0.14
Ń
-0.14
Fade
-0.13
yaptıģı
-0.13
POSITIVE LOGITS
Ñģобой
0.25
themselves
0.23
ä¸įäºĨ
0.21
(ed
0.19
differently
0.17
ingly
0.17
rowse
0.17
iani
0.16
ÑģÑıÑĤ
0.16
¶Į
0.16
Activations Density 0.211%