INDEX
Explanations
verbs related to communication and expression
New Auto-Interp
Negative Logits
son
-0.58
-
-0.54
B
-0.54
tal
-0.53
oprot
-0.51
一些
-0.49
sof
-0.48
式
-0.48
ecore
-0.46
sp
-0.46
POSITIVE LOGITS
itself
0.94
itself
0.93
itſelf
0.93
')")
0.82
++)
0.81
")[
0.80
'%(
0.79
resents
0.79
')):
0.79
')
0.78
Activations Density 0.651%